Skip to main content
Log in

Repeated patterns in genetic programming

  • Published:
Natural Computing Aims and scope Submit manuscript

Abstract

Evolved genetic programming trees contain many repeated code fragments. Size fair crossover limits bloat in automatic programming, preventing the evolution of recurring motifs. We examine these complex properties in detail using depth vs. size Catalan binary tree shape plots, subgraph and subtree matching, information entropy, sensitivity analysis, syntactic and semantic fitness correlations. Programs evolve in a self-similar fashion, akin to fractal random trees, with diffuse introns. Data mining frequent patterns reveals that as software is progressively improved a large proportion of it is exactly repeated subtrees as well as exactly repeated subgraphs. We relate this emergent phenomenon to building blocks in GP and suggest GP works by jumbling subtrees which already have high fitness on the whole problem to give incremental improvements and create complete solutions with multiple identical components of different importance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

References

  • Achaz G, Rocha EPC, Netter P, Coissac E (2002) Origin and fate of repeats in bacteria. Nucleic Acids Res 30:2987–2994

    Article  Google Scholar 

  • Banzhaf W, Nordin P, Keller RE, Francone FD (1998) Genetic programming – an introduction. Morgan Kaufmann

  • Blickle T (1996) Theory of evolutionary algorithms and application to system synthesis. PhD thesis, Swiss Federal Institute of Technology, Zurich

  • Britten RJ, Kohnen DE (1968) Repeated sequences in DNA. Science 161:529–540

    Article  Google Scholar 

  • Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press

  • Langdon WB (1998) Genetic programming and data structures. Kluwer, Boston

    MATH  Google Scholar 

  • Langdon WB (2000) Size fair and homologous tree genetic programming crossovers. Genet Program Evol Mach 1(1/2):95–119

    Article  MATH  Google Scholar 

  • Langdon WB, Banzhaf W (2005a) Repeated sequences in linear genetic programming genomes. Complex Syst 15(4):285–306

    MathSciNet  Google Scholar 

  • Langdon WB, Banzhaf W (2005b) Repeated patterns in tree genetic programming. In: Keijzer M, Tettamanzi A, Collet P, van Hemert JI, Tomassini M (eds) Proceedings of the 8th European conference on genetic programming, vol 3447 of Lecture Notes in Computer Science, Lausanne. Springer, Switzerland, pp 190–202

  • Langdon WB, Barrett SJ (2004) Genetic programming in data mining for drug discovery. In: Ghosh A, Jain LC (eds) Evolutionary computing in data mining, vol 163 of Studies in fuzziness and soft computing, chapter 10. Springer, pp 211– 235

  • Langdon WB, Poli R (2002) Foundations of genetic programming. Springer-Verlag

  • Langdon WB, Soule T, Poli R, Foster JA (1999) The evolution of size and shape. In: Spector L, Langdon WB, O’Reilly U-M, Angeline PJ (eds) Advances in genetic programming 3, chapter 8. MIT Press, pp 163–190

  • Lupski JR, Weinstock GM (1992) Short, interspersed repetitive DNA sequences in prokaryotic genomes. J Bacteriol 174:4525–4529

    Google Scholar 

  • Oakley H (1994) Two scientific applications of genetic programming: stack filters and non-linear equation fitting to chaotic data. In: Kinnear Jr KE (ed) Advances in genetic programming, chapter 17. MIT Press, pp 369–389

  • O’Reilly U-M, Oppacher U-M (1995) The troubling aspects of a building block hypothesis for genetic programming. In: Whitley LD, Vose MD (eds) Foundations of genetic algorithms 3, 31 July–2 August 1994, Estes Park, Colorado, USA. Morgan Kaufmann, pp 73–88

  • Patience C, Wilkinson DA, Weiss RA (1997) Our retroviral heritage. Trends Genet 13:116–120

    Article  Google Scholar 

  • Poli R (2003) A simple but theoretically-motivated method to control bloat in genetic programming. In: Ryan C, Soule T, Keijzer M, Tsang E, Poli R, Costa E (eds) Genetic programming, Proceedings of EuroGP’2003, vol 2610 of LNCS, Essex, UK. Springer-Verlag, pp 204–217

  • Poli R (2004) TinyGP. See TinyGP GECCO 2004 competition at http://cswww.essex.ac.uk/staff/sml/gecco/TinyGP.html

  • Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 26(9):2230–2236

    Article  Google Scholar 

  • Sedgewick R, Flajolet P (1996) An introduction to the analysis of algorithms. Addison-Wesley

  • Shannon CE and Weaver W (1964) The mathematical theory of communication. The University of Illinois Press, Urbana

    Google Scholar 

  • Smit AFA (1996) The origin of interspersed repeats in the human genome. Curr Opin Genet Dev 6:743–748

    Article  Google Scholar 

  • Syswerda G (1989) Uniform crossover in genetic algorithms. In Schaffer JD (ed) Proceedings of the third international conference on genetic algorithms, 4–7 June, George Mason University. Morgan Kaufmann, pp 2–9

  • Toth G, Gaspari Z, Jurka J (2000) Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res 10:967–981

    Article  Google Scholar 

Download references

Acknowledgements

This work was carried out while WBL was at University College, London and Essex University. WB thanks NSERC for grant RGPIN 283304-04.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to W. B. Langdon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Langdon, W.B., Banzhaf, W. Repeated patterns in genetic programming. Nat Comput 7, 589–613 (2008). https://doi.org/10.1007/s11047-007-9038-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11047-007-9038-8

Keywords

Navigation