research-article

A grammatical evolution based hyper-heuristic for the automatic design of split criteria

Authors:
Márcio Porto Basgalupp

ICT-UNIFESP, São José dos Campos, Brazil

ICT-UNIFESP, São José dos Campos, Brazil
View Profile

,
Rodrigo Coelho Barros

FACIN/PUCRS, Porto Alegre, Brazil

FACIN/PUCRS, Porto Alegre, Brazil
View Profile

,
Tiago Barabasz

ICT-UNIFESP, São José dos Campos, Brazil

ICT-UNIFESP, São José dos Campos, Brazil
View Profile

GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary ComputationJuly 2014Pages 1311–1318https://doi.org/10.1145/2576768.2598327

Published:12 July 2014Publication History

GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation

Pages 1311–1318

ABSTRACT

Top-down induction of decision trees (TDIDT) is a powerful method for data classification. A major issue in TDIDT is the decision on which attribute should be selected for dividing the nodes in subsets, creating the tree. For performing such a task, decision trees make use of a split criterion, which is usually an information-theory based measure. Apparently, there is no free-lunch regarding decision-tree split criteria, as is the case of most things in machine learning. Each application may benefit from a distinct split criterion, and the problem we pose here is how to identify the suitable split criterion for each possible application that may emerge. We propose in this paper a grammatical evolution algorithm for automatically generating split criteria through a context-free grammar. We name our new approach ESC-GE (Evolutionary Split Criteria with Grammatical Evolution). It is empirically evaluated on public gene expression datasets, and we compare its performance with state-of-the-art split criteria, namely the information gain and gain ratio. Results show that ESC-GE outperforms the baseline criteria in the domain of gene expression data, indicating its effectiveness for automatically designing tailor-made split criteria.

References

R. C. Barros, M. P. Basgalupp, A. C. P. L. F. de Carvalho, and A. A. Freitas. A hyper-heuristic evolutionary algorithm for automatically designing decision-tree algorithms. In 14th Genetic and Evolutionary Computation Conference (GECCO 2012), pages 1237--1244, 2012. Google ScholarDigital Library
R. C. Barros, M. P. Basgalupp, A. C. P. L. F. de Carvalho, and A. A. Freitas. Automatic Design of Decision-Tree Algorithms with Evolutionary Algorithms. Evolutionary Computation, 21(4), 2013. Google ScholarDigital Library
R. C. Barros, M. P. Basgalupp, A. A. Freitas, and A. C. P. L. F. de Carvalho. Evolutionary Design of Decision-Tree Algorithms Tailored to Microarray Gene Expression Data Sets. IEEE Transactions on Evolutionary Computation, in press, 2014.Google ScholarCross Ref
R. C. Barros, A. T. Winck, K. S. Machado, M. P. Basgalupp, A. C. P. L. F. de Carvalho, D. D. Ruiz, and O. S. de Souza. Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data. BMC Bioinformatics, 13, 2012.Google Scholar
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, 1984.Google Scholar
E. K. Burke, M. Hyde, G. Kendall, G. Ochoa, E. Ozcan, and R. Qu. A survey of hyper-heuristics. Technical Report Computer Science Technical Report No. NOTTCS-TR-SUB-0906241418-2747, School of Computer Science and Information Technology, University of Nottingham, 2009.Google Scholar
R. Casey and G. Nagy. Decision tree design using a probabilistic model. IEEE Transactions on Information Theory, 30(1):93--99, 1984. Google ScholarDigital Library
J. Ching, A. Wong, and K. Chan. Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(7):641--651, 1995. Google ScholarDigital Library
R. L. De Mantaras. A distance-based attribute selection measure for decision tree induction. Machine Learning, 6(1):81--92, 1991. Google ScholarDigital Library
J. Demsar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1--30, 2006. Google ScholarDigital Library
M. Gleser and M. Collen. Towards automated medical decisions. Computers and Biomedical Research, 5(2):180--189, 1972.Google ScholarCross Ref
C. Hartmann, P. Varshney, K. Mehrotra, and C. Gerberich. Application of information theory to the construction of efficient decision trees. IEEE Transactions on Information Theory, 28(4):565--577, 1982. Google ScholarDigital Library
R. Iman and J. Davenport. Approximations of the critical region of the friedman statistic. Communications in Statistics, pages 571--595, 1980.Google Scholar
B. Jun, C. Kim, Y.-Y. Song, and J. Kim. A New Criterion in Selection and Discretization of Attributes for the Generation of Decision Trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2):1371--1375, 1997. Google ScholarDigital Library
I. Kononenko, I. Bratko, and E. Roskar. Experiments in automatic learning of medical diagnostic rules. Technical report, Jozef Stefan Institute, Ljubljana, Yugoslavia, 1984.Google Scholar
R. Mckay, N. Hoai, P. Whigham, Y. Shan, and M. O Neill. Grammar-based Genetic Programming: a survey. Genetic Programming and Evolvable Machines, 11(3):365--396, 2010. Google ScholarDigital Library
J. Mingers. Expert systems - rule induction with statistical data. Journal of the Operational Research Society, 38:39--47, 1987.Google Scholar
J. Mingers. An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3(4):319--342, 1989. Google ScholarDigital Library
M. O'Neill and C. Ryan. Grammatical evolution. IEEE Transactions on Evolutionary Computation, 5(4):349--358, 2001. Google ScholarDigital Library
E. Ozcan, B. Bilgin, and E. E. Korkmaz. A comprehensive analysis of hyper-heuristics. Intelligent Data Analysis, 12(1):3--23, 2008. Google ScholarDigital Library
G. L. Pappa and A. A. Freitas. Automating the Design of Data Mining Algorithms: An Evolutionary Computation Approach. Springer Publishing Company, Incorporated, 2009. Google ScholarDigital Library
J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81--106, 1986. Google ScholarCross Ref
J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, San Francisco, CA, USA, 1993. Google ScholarDigital Library
L. Rokach and O. Maimon. Top-down induction of decision trees classifiers - a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 35(4):476--487, 2005. Google ScholarDigital Library
I. K. Sethi and G. P. R. Sarvarayudu. Hierarchical Classifier Design Using Mutual Information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 4(4):441--445, 1982. Google ScholarDigital Library
C. E. Shannon. A mathematical theory of communication. BELL System Technical Journal, 27(1):379--423, 625--56, 1948.Google ScholarCross Ref
M. Souto, I. Costa, D. de Araujo, T. Ludermir, and A. Schliep. Clustering cancer gene expression data: a comparative study. BMC Bioinformatics, 9(1):497, 2008.Google ScholarCross Ref
J. Talmon. A multiclass nonparametric partitioning algorithm. Pattern Recognition Letters, 4(1):31--38, 1986. Google ScholarDigital Library
A. Vella, D. Corne, and C. Murphy. Hyper-heuristic decision tree induction. World Congress on Nature & Biologically Inspired Computing, pages 409--414, 2010.Google Scholar
D. Wang and L. Jiang. An improved attribute selection measure for decision tree induction. In 4th International Conference on Fuzzy Systems and Knowledge Discovery, pages 654--658, 2007. Google ScholarDigital Library
S. S. Wilks. Mathematical Statistics. John Wiley & Sons Inc., 1962.Google Scholar
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, October 1999. Google ScholarDigital Library
X. Zhou and T. Dillon. A statistical-heuristic feature selection criterion for decision tree induction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8):834--841, 1991. Google ScholarDigital Library

Index Terms

A grammatical evolution based hyper-heuristic for the automatic design of split criteria
1. Computing methodologies
  1. Machine learning
    1. Learning settings

Recommendations

Shape grammars and grammatical evolution for evolutionary design
GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation

We describe the first steps in the adoption of Shape Grammars with Grammatical Evolution for application in Evolutionary Design. Combining the concepts of Shape Grammars and Genetic Programming opens up the exciting possibility of truly generative ...
Read More
Probabilistic Grammatical Evolution
Genetic Programming
Abstract
Grammatical Evolution (GE) is one of the most popular Genetic Programming (GP) variants, and it has been used with success in several problem domains. Since the original proposal, many enhancements have been proposed to GE in order to address some ...
Read More
An exploration of learning and grammars in grammatical evolution
GECCO '09: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers

This paper is concerned with the challenge of learning solutions to problems. The method employed here is a grammar based heuristic, where domain knowledge is encoded in a generative grammar, while evolution drives the update of the population of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation
July 2014
1478 pages
ISBN:9781450326629
DOI:10.1145/2576768
Editor-in-chief:
Christian Igel
Ruhr University of Bochum, University of Copenhagen
,
General Chair:
Dirk V. Arnold
Dalhousie University
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 July 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
grammatical evolution
hyper-heuristics
split criterion
Qualifiers
- research-article
Conference

Acceptance Rates
GECCO '14 Paper Acceptance Rate180of544submissions,33%Overall Acceptance Rate1,669of4,410submissions,38%
More
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 97
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A grammatical evolution based hyper-heuristic for the automatic design of split criteria

GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Shape grammars and grammatical evolution for evolutionary design

Probabilistic Grammatical Evolution

An exploration of learning and grammars in grammatical evolution

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A grammatical evolution based hyper-heuristic for the automatic design of split criteria

GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Shape grammars and grammatical evolution for evolutionary design

Probabilistic Grammatical Evolution

An exploration of learning and grammars in grammatical evolution

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media