Abstract
Evolved genetic programming trees contain many repeated code fragments. Size fair crossover limits bloat in automatic programming, preventing the evolution of recurring motifs. We examine these complex properties in detail using depth vs. size Catalan binary tree shape plots, subgraph and subtree matching, information entropy, sensitivity analysis, syntactic and semantic fitness correlations. Programs evolve in a self-similar fashion, akin to fractal random trees, with diffuse introns. Data mining frequent patterns reveals that as software is progressively improved a large proportion of it is exactly repeated subtrees as well as exactly repeated subgraphs. We relate this emergent phenomenon to building blocks in GP and suggest GP works by jumbling subtrees which already have high fitness on the whole problem to give incremental improvements and create complete solutions with multiple identical components of different importance.
Similar content being viewed by others
References
Achaz G, Rocha EPC, Netter P, Coissac E (2002) Origin and fate of repeats in bacteria. Nucleic Acids Res 30:2987–2994
Banzhaf W, Nordin P, Keller RE, Francone FD (1998) Genetic programming – an introduction. Morgan Kaufmann
Blickle T (1996) Theory of evolutionary algorithms and application to system synthesis. PhD thesis, Swiss Federal Institute of Technology, Zurich
Britten RJ, Kohnen DE (1968) Repeated sequences in DNA. Science 161:529–540
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press
Langdon WB (1998) Genetic programming and data structures. Kluwer, Boston
Langdon WB (2000) Size fair and homologous tree genetic programming crossovers. Genet Program Evol Mach 1(1/2):95–119
Langdon WB, Banzhaf W (2005a) Repeated sequences in linear genetic programming genomes. Complex Syst 15(4):285–306
Langdon WB, Banzhaf W (2005b) Repeated patterns in tree genetic programming. In: Keijzer M, Tettamanzi A, Collet P, van Hemert JI, Tomassini M (eds) Proceedings of the 8th European conference on genetic programming, vol 3447 of Lecture Notes in Computer Science, Lausanne. Springer, Switzerland, pp 190–202
Langdon WB, Barrett SJ (2004) Genetic programming in data mining for drug discovery. In: Ghosh A, Jain LC (eds) Evolutionary computing in data mining, vol 163 of Studies in fuzziness and soft computing, chapter 10. Springer, pp 211– 235
Langdon WB, Poli R (2002) Foundations of genetic programming. Springer-Verlag
Langdon WB, Soule T, Poli R, Foster JA (1999) The evolution of size and shape. In: Spector L, Langdon WB, O’Reilly U-M, Angeline PJ (eds) Advances in genetic programming 3, chapter 8. MIT Press, pp 163–190
Lupski JR, Weinstock GM (1992) Short, interspersed repetitive DNA sequences in prokaryotic genomes. J Bacteriol 174:4525–4529
Oakley H (1994) Two scientific applications of genetic programming: stack filters and non-linear equation fitting to chaotic data. In: Kinnear Jr KE (ed) Advances in genetic programming, chapter 17. MIT Press, pp 369–389
O’Reilly U-M, Oppacher U-M (1995) The troubling aspects of a building block hypothesis for genetic programming. In: Whitley LD, Vose MD (eds) Foundations of genetic algorithms 3, 31 July–2 August 1994, Estes Park, Colorado, USA. Morgan Kaufmann, pp 73–88
Patience C, Wilkinson DA, Weiss RA (1997) Our retroviral heritage. Trends Genet 13:116–120
Poli R (2003) A simple but theoretically-motivated method to control bloat in genetic programming. In: Ryan C, Soule T, Keijzer M, Tsang E, Poli R, Costa E (eds) Genetic programming, Proceedings of EuroGP’2003, vol 2610 of LNCS, Essex, UK. Springer-Verlag, pp 204–217
Poli R (2004) TinyGP. See TinyGP GECCO 2004 competition at http://cswww.essex.ac.uk/staff/sml/gecco/TinyGP.html
Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 26(9):2230–2236
Sedgewick R, Flajolet P (1996) An introduction to the analysis of algorithms. Addison-Wesley
Shannon CE and Weaver W (1964) The mathematical theory of communication. The University of Illinois Press, Urbana
Smit AFA (1996) The origin of interspersed repeats in the human genome. Curr Opin Genet Dev 6:743–748
Syswerda G (1989) Uniform crossover in genetic algorithms. In Schaffer JD (ed) Proceedings of the third international conference on genetic algorithms, 4–7 June, George Mason University. Morgan Kaufmann, pp 2–9
Toth G, Gaspari Z, Jurka J (2000) Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res 10:967–981
Acknowledgements
This work was carried out while WBL was at University College, London and Essex University. WB thanks NSERC for grant RGPIN 283304-04.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Langdon, W.B., Banzhaf, W. Repeated patterns in genetic programming. Nat Comput 7, 589–613 (2008). https://doi.org/10.1007/s11047-007-9038-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11047-007-9038-8