Abstract
This paper proposes a theoretical analysis of Genetic Programming (GP) from the perspective of statistical learning theory, a well grounded mathematical toolbox for machine learning. By computing the Vapnik-Chervonenkis dimension of the family of programs that can be inferred by a specific setting of GP, it is proved that a parsimonious fitness ensures universal consistency. This means that the empirical error minimization allows convergence to the best possible error when the number of test cases goes to infinity. However, it is also proved that the standard method consisting in putting a hard limit on the program size still results in programs of infinitely increasing size in function of their accuracy. It is also shown that cross-validation or hold-out for choosing the complexity level that optimizes the error rate in generalization also leads to bloat. So a more complicated modification of the fitness is proposed in order to avoid unnecessary bloat while nevertheless preserving universal consistency.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming: an introduction. Morgan Kaufmann Publisher Inc., San Francisco (1998)
Bleuler, S., Brack, M., Thiele, L., Zitzler, E.: Multiobjective genetic programming: Reducing bloat using SPEA2. In: Proceedings of the 2001 Congress on Evolutionary Computation CEC 2001, COEX, World Trade Center, 159 Samseong-dong, Gangnam-gu, Seoul, Korea, pp. 536–543. IEEE Press, Los Alamitos (2001)
Blickle, T., Thiele, L.: Genetic programming and redundancy. In: Hopf, J. (ed.) Genetic Algorithms Workshop at KI 1994, pp. 33–38. Max-Planck-Institut für Informatik (1994)
Daida, J.M., Bertram, R.R., Stanhope, S.A., Khoo, J.C., Chaudhary, S.A., Chaudhri, O.A., Polito II, J.A.: What makes a problem GP-Hard? Analysis of a tunably difficult problem in genetic programming. Genetic Programming and Evolvable Machines 2(2), 165–191 (2001)
De Jong, E.D., Watson, R.A., Pollack, J.B.: Reducing bloat and promoting diversity using multi-objective methods. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2001, pp. 11–18. Morgan Kaufmann Publishers, San Francisco (2001)
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1997)
Ekart, A., Nemeth, S.: Maintaining the diversity of genetic programs. In: Foster, J.A., Lutton, E., Miller, J., Ryan, C., Tettamanzi, A.G.B. (eds.) EuroGP 2002. LNCS, vol. 2278, pp. 162–171. Springer, Heidelberg (2002)
Gagné, C., Parizeau, M.: Genericity in evolutionary computation software tools: Principles and case study. International Journal on Artificial Intelligence Tools 15(2), 173–194 (2006)
Gustafson, S., Ekart, A., Burke, E., Kendall, G.: Problem difficulty and code growth in genetic programming. Genetic Programming and Evolvable Machines 4(3), 271–290 (2004)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Langdon, W.B.: The evolution of size in variable length representations. In: IEEE International Congress on Evolutionary Computations (ICEC 1998), pp. 633–638. IEEE Press, Los Alamitos (1998)
Langdon, W.B.: Size fair and homologous tree genetic programming crossovers. Genetic Programming And Evolvable Machines 1(1/2), 95–119 (2000)
Langdon, W.B., Poli, R.: Fitness causes bloat: Mutation. In: Late Breaking Papers at GP 1997, pp. 132–140. Stanford Bookstore (1997)
Langdon, W.B., Soule, T., Poli, R., Foster, J.A.: The evolution of size and shape. In: Advances in Genetic Programming III, pp. 163–190. MIT Press, Cambridge (1999)
Luke, S., Panait, L.: Lexicographic parsimony pressure. In: GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 829–836. Morgan Kaufmann Publishers, San Francisco (2002)
McPhee, N.F., Miller, J.D.: Accurate replication in genetic programming. In: Genetic Algorithms: Proceedings of the Sixth International Conference (ICGA 1995), Pittsburgh, PA, USA, pp. 303–309. Morgan Kaufmann, San Francisco (1995)
Nordin, P., Banzhaf, W.: Complexity compression and evolution. In: Genetic Algorithms: Proceedings of the Sixth International Conference (ICGA 1995), Pittsburgh, PA, USA, pp. 310–317. Morgan Kaufmann, San Francisco (1995)
Ratle, A., Sebag, M.: Avoiding the bloat with probabilistic grammar-guided genetic programming. In: Artificial Evolution VI. Springer, Heidelberg (2001)
Silva, S., Almeida, J.: Dynamic maximum tree depth: A simple technique for avoiding bloat in tree-based GP. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2723, pp. 1776–1787. Springer, Heidelberg (2003)
Silva, S., Costa, E.: Dynamic limits for bloat control: Variations on size and depth. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3103, pp. 666–677. Springer, Heidelberg (2004)
Soule, T.: Exons and code growth in genetic programming. In: Foster, J.A., Lutton, E., Miller, J., Ryan, C., Tettamanzi, A.G.B. (eds.) EuroGP 2002. LNCS, vol. 2278, pp. 142–151. Springer, Heidelberg (2002)
Soule, T., Foster, J.A.: Effects of code growth and parsimony pressure on populations in genetic programming. Evolutionary Computation 6(4), 293–309 (1998)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Zhang, B.-T., Mühlenbein, H.: Balancing accuracy and parsimony in genetic programming. Evolutionary Computation 3(1) (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amil, N.M., Bredeche, N., Gagné, C., Gelly, S., Schoenauer, M., Teytaud, O. (2009). A Statistical Learning Perspective of Genetic Programming. In: Vanneschi, L., Gustafson, S., Moraglio, A., De Falco, I., Ebner, M. (eds) Genetic Programming. EuroGP 2009. Lecture Notes in Computer Science, vol 5481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01181-8_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-01181-8_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01180-1
Online ISBN: 978-3-642-01181-8
eBook Packages: Computer ScienceComputer Science (R0)