Skip to main content
Log in

Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories

  • Original Paper
  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

Bloat is an excess of code growth without a corresponding improvement in fitness. This is a serious problem in Genetic Programming, often leading to the stagnation of the evolutionary process. Here we provide an extensive review of all the past and current theories regarding why bloat occurs. After more than 15 years of intense research, recent work is shedding new light on what may be the real reasons for the bloat phenomenon. We then introduce Dynamic Limits, our new approach to bloat control. It implements a dynamic limit that can be raised or lowered, depending on the best solution found so far, and can be applied either to the depth or size of the programs being evolved. Four problems were used as a benchmark to study the efficiency of Dynamic Limits. The quality of the results is highly dependent on the type of limit used: depth or size. The depth variants performed very well across the set of problems studied, achieving similar fitness to the baseline technique while using significantly smaller trees. Unlike many other methods available so far, Dynamic Limits does not require specific genetic operators, modifications in fitness evaluation or different selection schemes, nor does it add any parameters to the search process. Furthermore, its implementation is simple and its efficiency does not rely on the usage of a static upper limit. The results are discussed in the context of the newest bloat theory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. Or, as Bill Langdon put it, the “survival of the fattest”.

  2. There are two assumptions to this statement: (1) Selection must be random, not only in terms of fitness but also in terms of size. A selection scheme that always selects the larger individuals would cause bloat; (2) Genetic operators must not, by themselves, bias the population towards larger sizes. An operator that takes an individual and always doubles its size would cause bloat.

  3. http://gplab.sourceforge.net.

  4. http://www.mathworks.com/matlab.

  5. The number of time steps actually used in the original work by Koza [28] was 600, but a typographical error caused the number 400 to become more popular in the literature, the reason why we also use it.

References

  1. L. Altenberg, The evolution of evolvability in genetic programming, in Advances in Genetic Programming, ed. by K.E. Kinnear Jr. (MIT Press, Cambridge, MA, 1994), pp. 47–74

  2. L. Altenberg, Emergent phenomena in genetic programming, in Proceedings of the 3rd Conference on Evolutionary Programming, ed. by A.V. Sebald, L.J. Fogel (World Scientific Publishing, River Edge, NJ, 1994), pp. 233–241

  3. D. Andre, A. Teller, A study in program response and the negative effects of introns in genetic programming, in Proceedings of GP’96, ed. by J.R. Koza et al. (MIT Press, Cambridge, MA, 1996), pp. 28–31

  4. P.J. Angeline, Genetic programming and emergent intelligence, in Advances in Genetic Programming, ed. by K.E. Kinnear Jr. (MIT Press, Cambridge, MA, 1994), pp. 75–98

  5. P.J. Angeline, Two self-adaptive crossover operators for genetic programming, in Advances in Genetic Programming 2, ed. by P.J. Angeline, K.E. Kinnear Jr. (MIT Press, Cambridge, MA, 1996), pp. 89–110

  6. P.J. Angeline, A historical perspective on the evolution of executable structures. Fundam. Informaticae 35(1–4), 179–195 (1998)

    MATH  Google Scholar 

  7. W. Banzhaf, P. Nordin, R.E. Keller, F.D. Francone, Genetic Programming—An Introduction (dpunkt.verlag and Morgan Kaufmann, Heidelberg and San Francisco, CA, 1998)

  8. W. Banzhaf, W.B. Langdon, Some considerations on the reason for bloat. Genet. Program. Evolvable Mach. 3(1), 81–91 (2002)

    Article  MATH  Google Scholar 

  9. W. Banzhaf, F.D. Francone, P. Nordin, Some emergent properties of variable size EAs. Position paper at the Workshop on Evolutionary Computation with Variable Size Representation at ICGA-97 (1997)

  10. T. Blickle, Theory of evolutionary algorithms and applications to system design. PhD thesis, Swiss Federal Institute of Technology, Computer Engineering and Networks Laboratory (1996)

  11. T. Blickle, L. Thiele, Genetic programming and redundancy, in Genetic Algorithms within the Framework of Evolutionary Computation, ed. by J. Hopf (Max-Planck-Institut für Informatik, Saarbriicken, 1994), pp. 33–38

  12. M. Brameier, W. Banzhaf, Neutral variations cause bloat in linear GP, in Proceedings of EuroGP-2003, ed. by C. Ryan et al. (Springer, Berlin, 2003), pp. 286–296

  13. J. Cuendet, Populations dynamiques en programmation génétique. MSc thesis, Université de Lausanne, Université de Genève (2004)

  14. L.E. Da Costa, J.A. Landry, Relaxed genetic programming, in Proceedings of GECCO-2006, ed. by M. Keijzer et al. (ACM Press, New York, NY, 2006), pp. 937–938

  15. S. Dignum, R. Poli, Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat, in Proceedings of GECCO-2007, ed. by D. Thierens et al. (ACM Press, New York, NY, 2007), pp. 1588–1595

  16. S. Dignum, R. Poli, Crossover, sampling, bloat and the harmful effects of size limits, in Proceedings of EuroGP-2008, ed. by M. O’Neill et al. (Springer, Berlin, 2008), pp. 158–169

  17. A. Ekart, S.Z. Németh, Selection based on the pareto nondomination criterion for controlling code growth in genetic programming. Genet. Program. Evolvable Mach. 2(1), 61–73 (2001)

    Article  MATH  Google Scholar 

  18. F. Fernandez, L. Vanneschi, M. Tomassini, The effect of plagues in genetic programming: a study of variable-size populations, in Proceedings of EuroGP-2003, ed. by C. Ryan et al. (Springer, Berlin, 2003), pp. 317–326

  19. F. Fernandez, M. Tomassini, L. Vanneschi, Saving computational effort in genetic programming by means of plagues, in Proceedings of CEC-2003, ed. by R. Sarker et al. (IEEE Press, Piscataway, NJ, 2003), pp. 2042–2049

  20. C. Gathercole, P. Ross, An adverse interaction between crossover and restricted tree depth in genetic programming, in Proceedings of GP’96, ed. by J.R. Koza et al. (MIT Press, Cambridge, MA, 1996), pp. 291–296

  21. S. Gelly, O. Teytaud, N. Bredeche, M. Schoenauer, A statistical learning theory approach of bloat, in Proceedings of GECCO-2005, ed. by H.-G. Beyer et al. (ACM Press, New York, NY, 2005), pp. 1783–1784

  22. S. Gelly, O. Teytaud, N. Bredeche, M. Schoenauer, Universal consistency and bloat in GP. Rev. Intell. Artif. 20(6), 805–827 (2006)

    Article  Google Scholar 

  23. S. Gustafson, A. Ekart, E. Burke, G. Kendall, Problem difficulty and code growth in genetic programming. Genet. Program. Evolvable Mach. 5(3), 271–290 (2004)

    Article  Google Scholar 

  24. C. Igel, K. Chellapilla, Investigating the influence of depth and degree of genotypic change on fitness in genetic programming, in Proceedings of GECCO-1999, ed. by W. Banzhaf et al. (Morgan Kaufmann, San Francisco, CA, 1999), pp. 1061–1068

  25. K. Janardan, Weighted Lagrange distributions and their characterizations. SIAM J. Appl. Math. 47(2), 411–415 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  26. K. Janardan, B. Rao, Lagrange distributions of the second kind and weighted distributions. SIAM J. Appl. Math. 43(2), 302–313 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  27. K.E. Kinnear Jr., Generality and difficulty in genetic programming: evolving a sort, in Proceedings of ICGA’93, ed. by S. Forrest (Morgan Kaufmann, San Francisco, CA, 1993), pp. 287–294

  28. J.R. Koza, Genetic Programming – On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, MA, 1992)

  29. W.B. Langdon, Genetic Programming + Data Structures = Automatic Programming! (Kluwer Academic Publishers, Boston, MA, 1998)

  30. W.B. Langdon, The evolution of size in variable length representations, in Proceedings of the 1998 IEEE International Conference on Evolutionary Computation (IEEE Press, Piscataway, NJ, 1998), pp. 633–638

  31. W.B. Langdon, Size fair and homologous tree genetic programming crossovers. Genet. Program. Evolvable Mach. 1(1/2), 95–119 (2000)

    Article  MATH  Google Scholar 

  32. W.B. Langdon, Quadratic bloat in genetic programming, in Proceedings of GECCO-2000, ed. by D. Whitley et al. (Morgan Kaufmann, San Francisco, CA, 2000), pp. 451–458

  33. W.B. Langdon, R. Poli, Fitness causes bloat, in Proceedings of the Second On-line World Conference on Soft Computing in Engineering Design and Manufacturing, ed. by P.K. Chawdhry et al. (Springer, Berlin, 1997), pp. 13–22

  34. W.B. Langdon, R. Poli, An analysis of the MAX problem in genetic programming, in Proceedings of GP’97, ed. by J.R. Koza et al. (Morgan Kaufman, San Francisco, CA, 1997), pp. 222–230

  35. W.B. Langdon, R. Poli, Fitness causes bloat: mutation, in Proceedings of EuroGP’98, ed. by W. Banzhaf et al. (Springer, Berlin, 1998), pp. 37–48

  36. W.B. Langdon, R. Poli, Foundations of Genetic Programming (Springer, Berlin, 2002)

  37. W.B. Langdon, T. Soule, R. Poli, J.A. Foster, The evolution of size and shape, in Advances in Genetic Programming 3, ed. by L. Spector et al. (MIT Press, Cambridge, MA, 1999), pp. 163–190

  38. W.B. Langdon, W. Banzhaf, Genetic programming bloat without semantics, in Proceedings of PPSN-2000, ed. by M. Schoenauer et al. (Springer, Berlin, 2000), pp. 201–210

  39. S. Luke, Code growth is not caused by introns, in Late Breaking Papers at GECCO-2000 (2000), pp. 228–235

  40. S. Luke, Issues in scaling genetic programming: breeding strategies, tree generation, and code bloat. PhD thesis, Department of Computer Science, University of Maryland (2000)

  41. S. Luke, G.C. Balan, L. Panait, Population implosion in genetic programming, in Proceedings of GECCO-2003, ed. by E. Cantú-Paz et al. (Springer, Berlin, 2003), pp. 1729–1739

  42. S. Luke, Modification point depth and genome growth in genetic programming. Evol. Comput. 11(1), 67–106 (2003)

    Article  Google Scholar 

  43. S. Luke, L. Panait, Fighting bloat with nonparametric parsimony pressure, in Proceedings of PPSN-2002, ed. by J.M. Guervos et al. (Springer, Berlin, 2002), pp. 411–420

  44. S. Luke, L. Panait, Lexicographic parsimony pressure, in Proceedings of GECCO-2002, ed. by W.B. Langdon et al. (Morgan Kaufmann, San Francisco, CA, 2002), pp. 829–836

  45. S. Luke, L. Panait, A comparison of bloat control methods for genetic programming. Evol. Comput. 14(3), 309–344 (2006)

    Article  Google Scholar 

  46. N.F. McPhee, J.D. Miller, Accurate replication in genetic programming, in Proceedings of ICGA’95, ed. by L. Eshelman (Morgan Kaufmann, San Francisco, CA, 1995), pp. 303–309

  47. N.F. McPhee, A. Jarvis, E.F. Crane, On the strength of size limits in linear genetic programming, in Proceedings of GECCO-2004, ed. by K. Deb et al. (Springer, Berlin, 2004), pp. 593–604

  48. N.F. McPhee, R. Poli, A schema theory analysis of the evolution of size in genetic programming with linear representations, in Proceedings of EuroGP-2001, ed. by J. Miller et al. (Springer, Berlin, 2001), pp. 108–125

  49. P. Nordin, W. Banzhaf, Complexity compression and evolution, in Proceedings of ICGA’95, ed. by L. Eshelman (Morgan Kaufmann, San Francisco, CA, 1995), pp. 318–325

  50. P. Nordin, F. Francone, W. Banzhaf, Explicitly defined introns and destructive crossover in genetic programming, in Proceedings of the Workshop on Genetic Programming: From Theory to Real-World Applications, ed. by J.P. Rosca (1995), pp. 6–22

  51. P. Nordin, F. Francone, W. Banzhaf, Explicitly defined introns and destructive crossover in genetic programming, in Advances in Genetic Programming 2, ed. by P.J. Angeline, K.E. Kinnear Jr. (MIT Press, Cambridge, MA, 1996), pp. 111–134

  52. U.-M. O’Reilly, F. Oppacher, Hybridized crossover-based search techniques for program discovery, in Proceedings of the 1995 World Conference on Evolutionary Computation (IEEE Press, Piscataway, NJ, 1995), pp. 573–578

  53. R. Poli, General schema theory for genetic programming with subtree-swapping crossover, in Proceedings of EuroGP-2001, ed. by J. Miller et al. (Springer, Berlin, 2001), pp. 143–159

  54. R. Poli, A simple but theoretically-motivated method to control bloat in genetic programming, in Proceedings of EuroGP-2003, ed. by C. Ryan et al. (Springer, Berlin, 2003), pp. 200–210

  55. R. Poli, W.B. Langdon, S. Dignum, On the limiting distribution of program sizes in tree-based genetic programming, in Proceedings of EuroGP-2007, ed. by M. Ebner et al. (Springer, Berlin, 2007), pp. 193–204

  56. R. Poli, N.F. McPhee, L. Vanneschi, The impact of population size on code growth in GP: analysis and empirical validation, in Proceedings of GECCO-2008, ed. by M. Keijzer et al. (ACM Press, New York, NY, 2008), pp. 1275–1282

  57. R. Poli, N.F. McPhee, L. Vanneschi, Elitism reduces bloat in genetic programming, in Proceedings of GECCO-2008, ed. by M. Keijzer et al. (ACM Press, New York, NY, 2008), pp. 1343–1344

  58. R. Poli, N.F. McPhee, L. Vanneschi, Analysis of the effects of elitism on bloat in linear and tree-based genetic programming, in Genetic Programming Theory and Practice VI, ed. by R. Riolo et al. (Springer, Berlin, 2008), pp. 91–111

  59. D. Rochat, Programmation génétique parallèle: opérateurs génétiques variés et populations dynamiques. MSc thesis, Université de Lausanne, Université de Genève (2004)

  60. D. Rochat, M. Tomassini, L. Vanneschi, Dynamic size populations in distributed genetic programming, in Proceedings of EuroGP-2005, ed. by M. Keijzer et al. (Springer, Berlin, 2005), pp. 50–61

  61. J.P. Rosca, Generality versus size in genetic programming, in Proceedings of GP’96, ed. by J.R. Koza et al. (MIT Press, Cambridge, MA, 1996), pp. 381–387

  62. J.P. Rosca, Analysis of complexity drift in genetic programming, in Proceedings of GP’97, ed. by J.R. Koza et al. (Morgan Kaufmann, San Francisco, CA, 1997), pp. 286–294

  63. S. Silva, Controlling bloat: individual and population based approaches in genetic programming. PhD thesis, Departamento de Engenharia Informatica, Universidade de Coimbra (2008)

  64. S. Silva, J. Almeida, GPLAB—a genetic programming toolbox for MATLAB, in Proceedings of the Nordic MATLAB Conference, ed. by L. Gregersen (2003), pp. 273–278

  65. S. Silva, J. Almeida, Dynamic maximum tree depth—a simple technique for avoiding bloat in tree-based GP, in Proceedings of GECCO-2003, ed. by E. Cantú-Paz et al. (Springer, Berlin, 2003), pp. 1776–1787

  66. S. Silva, E. Costa, Dynamic limits for bloat control—variations on size and depth, in Proceedings of GECCO-2004, ed. by K. Deb et al. (Springer, Berlin, 2004), pp. 666–677

  67. S. Silva, P.J.N. Silva, E. Costa, Resource-limited genetic programming: replacing tree depth limits, in Proceedings of ICANNGA-2005, ed. by B. Ribeiro et al. (Springer, Berlin, 2005), pp. 243–246

  68. S. Silva, E. Costa, Resource-limited genetic programming: the dynamic approach, in Proceedings of GECCO-2005, ed. by H.-G. Beyer et al. (ACM Press, New York, NY, 2005), pp. 1673–1680

  69. S. Silva, E. Costa, Comparing tree depth-limits and resource-limited GP, in Proceedings of CEC-2005, ed. by D. Corne et al. (IEEE Press, Piscataway, NJ, 2005), pp. 920–927

  70. P.W.H. Smith, K. Harries, Code growth, explicitly defined introns, and alternative selection schemes. Evol. Comput. 6(4), 339–360 (1998)

    Article  Google Scholar 

  71. T. Soule, J.A. Foster, Removal bias: a new cause of code growth in tree based evolutionary programming, in Proceedings of the 1998 IEEE International Conference on Evolutionary Computation (IEEE Press, Piscataway, NJ, 1998), pp. 781–786

  72. T. Soule, Code growth in genetic programming. PhD thesis, College of Graduate Studies, University of Idaho (1998)

  73. T. Soule, J. Foster, Code size and depth flows in genetic programming, in Proceedings of GP’97, ed. by J. Koza et al. (Morgan Kaufmann, San Francisco, CA, 1997), pp. 313–320

  74. T. Soule, R.B. Heckendorn, An analysis of the causes of code growth in genetic programming. Genet. Program. Evolvable Mach. 3(1), 283–309 (2002)

    Article  MATH  Google Scholar 

  75. J. Stevens, R.B. Heckendorn, T. Soule, Exploiting disruption aversion to control code bloat, in Proceedings of GECCO-2005, ed. by H.-G. Beyer et al. (ACM Press, New York, NY, 2005), pp. 1605–1612

  76. M.J. Streeter, The root causes of code growth in genetic programming, in Proceedings of EuroGP-2003, ed. by C. Ryan et al. (Springer, Berlin, 2003), pp. 443–454

  77. W.A. Tackett, Recombination, selection, and the genetic construction of genetic programs. PhD thesis, Department of Electrical Engineering Systems, University of Southern California (1994)

  78. M. Tomassini, L. Vanneschi, J. Cuendet, F. Fernandez, A new technique for dynamic size populations in genetic programming, in Proceedings of CEC-2004 (IEEE Press, Piscataway, NJ, 2004), pp. 486–493

  79. T. Van Belle, D.H. Ackley, Uniform subtree mutation, in Proceedings of EuroGP-2002, ed. by J.A. Foster et al. (Springer, Berlin, 2002), pp. 152–161

  80. L. Vanneschi, Theory and practice for efficient genetic programming. PhD thesis, Faculty of Sciences, University of Lausanne (2004)

  81. B.-T. Zhang, H. Mühlenbein, Balancing accuracy and parsimony in genetic programming. Evol. Comput. 3(1), 17–38 (1995)

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by grant QLK2-CT-2000-01020 from the European Commission and grants SFRH/BD/14167/2003 and POCTI/1999/BSE/34794 from Fundação para a Ciência e a Tecnologia, Portugal. We would like to thank Jonas Almeida (University of Texas MD Anderson Cancer Center, USA) for his collaboration in the early stages of this research, and Riccardo Poli (University of Essex, UK) for fruitful discussions leading to a substantial improvement of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sara Silva.

Appendix

Appendix

Table 4 p-Values concerning the mean tree size (top right half) and best fitness of run (bottom left half) on the Regression problem, using pairwise non-parametric ANOVAs
Table 5 p-Values concerning the mean tree size (top right half) and best fitness of run (bottom left half) on the Artificial Ant problem, using pairwise non-parametric ANOVAs
Table 6 p-Values concerning the mean tree size (top right half) and best fitness of run (bottom left half) on the Parity problem, using pairwise non-parametric ANOVAs
Table 7 p-Values concerning the mean tree size (top right half) and best fitness of run (bottom left half) on the Multiplexer problem, using pairwise non-parametric ANOVAs

Rights and permissions

Reprints and permissions

About this article

Cite this article

Silva, S., Costa, E. Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genet Program Evolvable Mach 10, 141–179 (2009). https://doi.org/10.1007/s10710-008-9075-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10710-008-9075-9

Keywords

Navigation