Abstract
Fitness functions based on test cases are very common in Genetic Programming (GP). This process can be assimilated to a learning task, with the inference of models from a limited number of samples. This paper is an investigation on two methods to improve generalization in GP-based learning: 1) the selection of the best-of-run individuals using a three data sets methodology, and 2) the application of parsimony pressure in order to reduce the complexity of the solutions. Results using GP in a binary classification setup show that while the accuracy on the test sets is preserved, with less variances compared to baseline results, the mean tree size obtained with the tested methods is significantly reduced.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, Inc, New York (2001)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Eiben, A.E., Jelasity, M.: A critical note on experimental research methodology in EC. In: Proceedings of the 2002 Congress on Evolutionary Computation (CEC 2002), Honolulu (HI), USA, pp. 582–587. IEEE Press, Los Alamitos (2002)
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Domingos, P.: The role of occam’s razor in knowledge discovery. Data Mining and Knowledge Discovery 3(4), 409–425 (1999)
Banzhaf, W., Langdon, W.B.: Some considerations on the reason for bloat. Genetic Programming and Evolvable Machines 3(1), 81–91 (2002)
Langdon, W.B.: Size fair and homologous tree genetic programming crossovers. Genetic Programming and Evolvable Machines 1(1/2), 95–119 (2000)
Ekárt, A., Németh, S.Z.: Selection based on the pareto nondomination criterion for controlling code growth in genetic programming. Genetic Programming and Evolvable Machines 2(1), 61–73 (2001)
Luke, S., Panait, L.: Lexicographic parsimony pressure. In: Proceedings of the 2002 Genetic and Evolutionary Computation Conference (GECCO 2002), pp. 829–836. Morgan Kaufmann Publishers, New York (2002)
Silva, S., Almeida, J.: Dynamic maximum tree depth. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2723, pp. 1776–1787. Springer, Heidelberg (2003)
Newman, D., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Sherrah, J., Bogner, R.E., Bouzerdoum, A.: The evolutionary pre-processor: Automatic feature extraction for supervised classification using genetic programming. In: Genetic Programming 1997: Proceedings of the Second Annual Conference, Stanford University (CA), USA, pp. 304–312. Morgan Kaufmann, San Francisco (1997)
Brameier, M., Banzhaf, W.: Evolving teams of predictors with linear genetic programming. Genetic Programming and Evolvable Machines 2(4), 381–407 (2001)
Yu, T., Chen, S.H., Kuo, T.W.: Discovering financial technical trading rules using genetic programming with lambda abstraction. In: Genetic Programming Theory and Practice II, Ann Arbor (MI), USA, pp. 11–30 (2004)
Panait, L., Luke, S.: Methods for evolving robust programs. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2724, pp. 1740–1751. Springer, Heidelberg (2003)
Rowland, J.J.: Generalisation and model selection in supervised learning with evolutionary computation. In: Raidl, G.R., Cagnoni, S., Cardalda, J.J.R., Corne, D.W., Gottlieb, J., Guillot, A., Hart, E., Johnson, C.G., Marchiori, E., Meyer, J.-A., Middendorf, M. (eds.) EvoIASP 2003, EvoWorkshops 2003, EvoSTIM 2003, EvoROB/EvoRobot 2003, EvoCOP 2003, EvoBIO 2003, and EvoMUSART 2003. LNCS, vol. 2611, pp. 119–130. Springer, Heidelberg (2003)
Kushchu, I.: Genetic programming and evolutionary generalization. IEEE transactions on Evolutionary Computation 6(5), 431–442 (2002)
Nordin, P., Banzhaf, W.: Complexity compression and evolution. In: Proceedings of the Sixth International Conference Genetic Algorithms, Pittsburgh (PA), USA, pp. 310–317. Morgan Kaufmann, San Francisco (1995)
Soule, T., Foster, J.A.: Effects of code growth and parsimony pressure on populations in genetic programming. Evolutionary Computation 6(4), 293–309 (1998)
Gustafson, S., Ekart, A., Burke, E., Kendall, G.: Problem difficulty and code growth in genetic programming. Genetic Programming and Evolvable Machines 5(3), 271–290 (2004)
Iba, H., de Garis, H., Sato, T.: Genetic programming using a minimum description length principle. In: Advances in Genetic Programming. Complex Adaptive Systems, pp. 265–284. MIT Press, Cambridge (1994)
Zhang, B.T., Mühlenbein, H.: Balancing accuracy and parsimony in genetic programming. Evolutionary Computation 3(1), 17–38 (1995)
Rosca, J.: Generality versus size in genetic programming. In: Genetic Programming 1996: Proceedings of the First Annual Conference, Stanford University (CA), USA, pp. 381–387 (1996)
Cavaretta, M.J., Chellapilla, K.: Data mining using genetic programming: The implications of parsimony on generalization error. In: Proceedings of the 1999 Congress on Evolutionary Computation (CEC 1999), Washington (DC), USA, pp. 1330–1337 (1999)
Gagné, C., Parizeau, M.: Open BEAGLE: A new versatile C++ framework for evolutionary computation. In: Late-Breaking Papers of the 2002 Genetic and Evolutionary Computation Conference (GECCO 2002), New York (NY), USA (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gagné, C., Schoenauer, M., Parizeau, M., Tomassini, M. (2006). Genetic Programming, Validation Sets, and Parsimony Pressure. In: Collet, P., Tomassini, M., Ebner, M., Gustafson, S., Ekárt, A. (eds) Genetic Programming. EuroGP 2006. Lecture Notes in Computer Science, vol 3905. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11729976_10
Download citation
DOI: https://doi.org/10.1007/11729976_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33143-8
Online ISBN: 978-3-540-33144-5
eBook Packages: Computer ScienceComputer Science (R0)