Skip to main content

Advertisement

Log in

Have your spaghetti and eat it too: evolutionary algorithmics and post-evolutionary analysis

  • Original Paper
  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

This paper focuses on two issues, first perusing the idea of algorithmic design through genetic programming (GP), and, second, introducing a novel approach for analyzing and understanding the evolved solution trees. Considering the problem of list search, we evolve iterative algorithms for searching for a given key in an array of integers, showing that both correct linear-time and far more efficient logarithmic-time algorithms can be repeatedly designed by Darwinian means. Next, we turn to the (evolved) dish of spaghetti (code) served by GP. Faced with the all-too-familiar conundrum of understanding convoluted—and usually bloated—GP-evolved trees, we present a novel analysis approach, based on ideas borrowed from the field of bioinformatics. Our system, dubbed G-PEA (GP Post-Evolutionary Analysis), consists of two parts: (1) Defining a functionality-based similarity score between expressions, G-PEA uses this score to find subtrees that carry out similar semantic tasks; (2) Clustering similar sub-expressions from a number of independently evolved fit solutions, thus identifying important semantic building blocks ensconced within the hard-to-read GP trees. These blocks help identify the important parts of the evolved solutions and are a crucial step in understanding how they work. Other related GP aspects, such as code simplification, bloat control, and building-block preserving crossover, may be extended by applying the concepts we present.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. R. Abbott, J. Guo, B. Parviz, Guided genetic programming, in The 2003 International Conference on Machine Learning; Models, Technologies and Applications (MLMTA’03) (CSREA Press, Las Vegas, 2003)

  2. A. Agapitos, S.M. Lucas, Evolving efficient recursive sorting algorithms, in Proceedings of the 2006 IEEE Congress on Evolutionary Computation (IEEE Press, Vancouver, 2006), pp. 9227–9234

  3. A. Agapitos, S.M. Lucas, Evolving modular recursive sorting algorithms, in EuroGP (2007), pp. 301–310

  4. M. Ahluwalia, L. Bull, Coevolving functions in genetic programming. J. Syst. Arch. 47(7), 573–585 (2001)

    Article  Google Scholar 

  5. P.J. Angeline, A historical perspective on the evolution of executable structures. Fundam. Informaticae 35(1–4), 179–195 (1998)

    MATH  Google Scholar 

  6. S. Bellon, R. Koschke, G. Antoniol, J. Krinke, E. Merlo, Comparison and evaluation of clone detection tools. IEEE Trans. Softw. Eng. 33(9), 577–591 (2007)

    Article  Google Scholar 

  7. P. Bille, A survey on tree edit distance and related problems. Theor. Comput. Sci. 337(1–3), 217–239 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  8. D. Boffelli, M. Nobrega, E. Rubin, Comparative genomics at the vertebrate extremes. Nat. Rev. Genet. 5(6), 456–465 (2004)

    Article  Google Scholar 

  9. M. Brameier, W. Banzhaf, Linear Genetic Programming. (Springer, New York, 2007)

    MATH  Google Scholar 

  10. E.K. Burke, S. Gustafson, G. Kendall, Diversity in genetic programming: an analysis of measures and correlation with fitness. IEEE Trans. Evol. Comput. 8(1), 47–62 (2004)

    Article  Google Scholar 

  11. V. Ciesielski, X. Li, Analysis of genetic programming runs, in Proceedings of The Second Asian-Pacific Workshop on Genetic Programming, ed. by R.I. Mckay, S.B. Cho (Cairns, Australia, 2004)

    Google Scholar 

  12. M. Clergue, P. Collard, M. Tomassini, L. Vanneschi, Fitness distance correlation and problem difficulty for genetic programming. in GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, ed. by W.B. Langdon et al. (Morgan Kaufmann Publishers, New York, 2002), pp. 724–732

    Google Scholar 

  13. M. Crochemore, G. Landau, M. Ziv-Ukelson, A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J. Comput. 32, 1654 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  14. R. Cummins, C. O’Riordan, An analysis of the solution space for genetically programmed term-weighting schemes in information retrieval. in 17th Irish Artificial Intelligence and Cognitive Science Conference (AICS 2006), ed. by D.A. Bell (Queen’s University, Belfast, 2006)

    Google Scholar 

  15. J. Daida, R. Bertram, J. Polito, S. Stanhope, Analysis of single-node (building) blocks in genetic programming. Adv. Genet. Program. 3, 217–241 (1999)

    Google Scholar 

  16. E.D. De Jong, R.A. Watson, J.B. Pollack, Reducing bloat and promoting diversity using multi-objective methods. in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), ed. by L. Spector et al. (Morgan Kaufmann, San Francisco, California, USA, 2001), pp. 11–18

    Google Scholar 

  17. D. Doherty, C. O’Riordan, A phenotypic analysis of GP-evolved team behaviours. in GECCO ’07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, vol. 2, ed. by D. Thierens et al. (ACM Press, London, 2007), pp. 1951–1958

    Chapter  Google Scholar 

  18. A. Ekárt, Shorter fitness preserving genetic programs. in Artificial Evolution. 4th European Conference, AE’99, Selected Papers, LNCS, vol. 1829, ed. by C. Fonlupt et al. (Dunkerque, France, 2000), pp. 73–83

  19. A. Ekárt, S.Z. Nemeth, A metric for genetic programs and fitness sharing. in EuroGP’2000: Proceedings of Third European Conference on Genetic Programming, LNCS, vol. 1802, ed. by R. Poli et al. (Springer, Edinburgh, 2000), pp. 259–270

    Google Scholar 

  20. A. Ekárt, S.Z. Németh, Maintaining the diversity of genetic programs. in EuroGP ’02: Proceedings of the 5th European Conference on Genetic Programming (Springer, London, UK, 2002), pp. 162–171

  21. S. Forrest, T. Nguyen, W. Weimer, C. Le Goues, A genetic programming approach to automated software repair. in GECCO ’09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation (ACM, New York, NY, USA, 2009), pp. 947–954

  22. M. Gabel, L. Jiang, Z. Su, Scalable detection of semantic clones. in Proceedings of the 30th international conference on Software engineering (ACM, New York, NY, USA, 2008), pp. 321–330

  23. D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. (Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1989)

    MATH  Google Scholar 

  24. D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. (Cambridge University Press, Cambridge, 1997)

    Book  MATH  Google Scholar 

  25. D. Harel, Algorithmics: The Spirit of Computing, 2nd edn. (Addison-Wesley Publishing Company, Readings, MA, 1992)

    Google Scholar 

  26. M. Harman, The current state and future of search based software engineering. in: FOSE ’07: 2007 Future of Software Engineering (IEEE Computer Society, Washington, DC, USA, 2007), pp. 342–357. doi:http://dx.doi.org/10.1109/FOSE.2007.29

  27. A. Hauptman, M. Sipper, Analyzing the intelligence of a genetically programmed chess player. in Late breaking papers at GECCO’2005, ed. by F. Rothlauf (Washington, DC, USA, 2005)

  28. A. Hauptman, M. Sipper, Emergence of complex strategies in the evolution of chess endgame players. Adv. Complex Syst. 10, 35–59 (2007)

    Article  MATH  Google Scholar 

  29. I. Hofacker, W. Fontana, P. Stadler, L. Bonhoeffer, M. Tacker, P. Schuster, Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie/Chem. Mon. 125(2), 167–188 (1994)

    Article  Google Scholar 

  30. T. Jones, S. Forrest, Fitness distance correlation as a measure of problem difficulty for genetic algorithms. in Proceedings of the Sixth International Conference on Genetic Algorithms (Morgan Kaufmann, 1995), pp. 184–192

  31. A. Joó, J.P. Neirotti, Towards identifying salient patterns in genetic programming individuals. in GECCO ’09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, ed. by G. Raidl et al. (ACM, Montreal, 2009), pp. 1885–1886

    Chapter  Google Scholar 

  32. Y. Kameya, J. Kumagai, Y. Kurata, Accelerating genetic programming by frequent subtree mining. in GECCO ’08: Proceedings of the 10th annual conference on Genetic and evolutionary computation, ed. by M. Keijzer et al. (ACM, Atlanta, GA, USA, 2008), pp. 1203–1210

    Chapter  Google Scholar 

  33. R.E. Keller, W. Banzhaf, Explicit maintenance of genetic diversity on genospaces (1994). Unpublished manuscript

  34. K.E. Kinnear Jr., Generality and difficulty, in genetic programming: Evolving a sort. in Proceedings of the 5th International Conference on Genetic Algorithms (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993), pp. 287–294

  35. K.E. Kinnear Jr., Evolving a sort: lessons in genetic programming, in Proceedings of the 1993 International Conference on Neural Networks, vol. 2 (IEEE Press, San Francisco, USA, 1993), pp. 881–888

  36. D. Kinzett, M. Johnston, M. Zhang, How online simplification affects building blocks in genetic programming, in GECCO ’09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, ed. by G. Raidl et al. (ACM, Montreal, 2009), pp. 979–986

    Chapter  Google Scholar 

  37. D. Kinzett, M. Zhang, M. Johnston, Using numerical simplification to control bloat in genetic programming, in Proceedings of the 7th International Conference on Simulated Evolution And Learning (SEAL ’08), Lecture Notes in Computer Science, vol. 5361, ed. by X. Li et al. (Springer, Melbourne, Australia, 2008), pp. 493–502

    Google Scholar 

  38. E. Kirshenbaum, Iteration over vectors in genetic programming. Technical Report HPL-2001-327, HP Laboratories (2001)

  39. D.E. Knuth, Sorting and Searching, The Art of Computer Programming, vol. 3. (Addison-Wesley, Reading, Massachusetts, 1975)

    Google Scholar 

  40. M. Kouylekov, B. Magnini, Tree edit distance for textual entailment, in Recent Advances in Natural Language Processing IV: Selected Papers from RANLP (2005)

  41. J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. (MIT Press, Cambridge, MA, USA, 1992)

    MATH  Google Scholar 

  42. J.R. Koza, Genetic Programming II: Automatic Discovery of Reusable Programs. (MIT Press, Cambridge, MA, 1994)

    MATH  Google Scholar 

  43. J.R. Koza, D. Andre, F.H. Bennett III, M. Keane, Genetic Programming III: Darwinian Invention and Problem Solving. (Morgan Kaufman, 1999)

  44. A. Kuhn, S. Ducasse, T. Gírba, Semantic clustering: Identifying topics in source code. Inf. Softw. Technol. 49(3), 230–243 (2007)

    Article  Google Scholar 

  45. W.B. Langdon, W. Banzhaf, Repeated patterns in genetic programming. Nat. Comput. 7(4), 589–613 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  46. A. Lozano, R.Y. Pinter, O. Rokhlenko, G. Valiente, M. Ziv-Ukelson, Seeded tree matching and planar tanglegram layout, in Proceedings of 7th International Workshop on Algorithms in Bioinformatics, LNCS 4645 (2007), pp. 98–110

  47. S. Luke, L. Panait, A Java-based evolutionary computation research system. Online (2004) http://cs.gmu.edu/~eclab/projects/ecj

  48. W. Masek, M. Paterson, A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20(1), 18–31 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  49. N. McPhee, B. Ohs, T. Hutchison, Semantic building blocks in genetic programming. Lect. Notes Comput. Sci. 4971, 134 (2008)

    Article  Google Scholar 

  50. G. Miklos, G. Rubin, The role of the genome project in determining gene function: insights from model organisms. Cell 86, 521–529 (1996)

    Article  MATH  Google Scholar 

  51. D.J. Montana, Strongly typed genetic programming. Evol. Comput. 3(2), 199–230 (1995)

    Article  Google Scholar 

  52. Q.U. Nguyen, M. O’Neill, X.H. Nguyen, B. McKay, E.G. Lopez, Semantic similarity based crossover in GP: the case for real-valued function regression, in Evolution Artificielle, 9th International Conference, Lecture Notes in Computer Science, ed. by P. Collet (2009), pp. 13–24

  53. U.M. O’Reilly, Using a distance metric on genetic programs to understand genetic operators, in IEEE International Conference on Systems, Man, and Cybernetics, Computational Cybernetics and Simulation, vol. 5 (Orlando, Florida, USA, 1997), pp. 4092–4097

  54. U.M. O’Reilly, F. Oppacher, The troubling aspects of a building block hypothesis for genetic programming, in Foundations of Genetic Algorithms 3, ed. by L.D. Whitley et al. (Morgan Kaufmann, Estes Park, Colorado, USA, 1994), pp. 73–88. Published 1995

  55. U.M. O’Reilly, F. Oppacher, A comparative analysis of GP, in Advances in Genetic Programming 2, chap. 2, ed. by P.J. Angeline, K.E. Kinnear Jr. (MIT Press, Cambridge, MA, USA, 1996), pp. 23–44

    Google Scholar 

  56. R. Pinter, O. Rokhlenko, E. Yeger-Lotem, M. Ziv-Ukelson, Alignment of metabolic pathways. Bioinformatics 21(16), 3401–3408 (2005)

    Article  Google Scholar 

  57. R. Poli, W.B. Langdon, N.F. McPhee, A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk (2008)

  58. S.C. Roberts, D. Howard, J.R. Koza, Evolving modules in genetic programming by subtree encapsulation, in Genetic Programming, Proceedings of EuroGP’2001, LNCS, vol. 2038, ed. by J.F. Miller et al. (Springer, Lake Como, Italy, 2001), pp. 160–175

    Google Scholar 

  59. J.P. Rosca, D.H. Ballard, Discovery of subroutines in genetic programming, in Advances in Genetic Programming 2, chap. 9, ed. by P.J. Angeline et al. (MIT Press, Cambridge, MA, USA, 1996), pp. 177–202

    Google Scholar 

  60. B.A. Shapiro, K. Zhang, Comparing multiple RNA secondary structures using tree comparisons. Comput. Appl. Biosci. 6(4), 309–318 (1990)

    Google Scholar 

  61. S. Shirakawa, T. Nagao, Evolution of sorting algorithm using graph structured program evolution, in SMC (IEEE, 2007), pp. 1256–1261

  62. M. Sipser, Introduction to the Theory of Computation, 2nd edn. (Course Technology, Florence, KY, 2005)

  63. W. Smart, P. Andreae, M. Zhang, Empirical analysis of GP tree-fragments, in Proceedings of the 10th European Conference on Genetic Programming, Lecture Notes in Computer Science, vol. 4445, ed. by M. Ebner et al. (Springer, Valencia, Spain, 2007), pp. 55–67

    Google Scholar 

  64. M. Smith, L. Bull, Improving the human readability of features constructed by genetic programming, in GECCO ’07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, vol. 2, ed. by D. Thierens et al. (ACM Press, London, 2007), pp. 1694–1701

    Chapter  Google Scholar 

  65. T. Soule, J.A. Foster, Effects of code growth and parsimony pressure on populations in genetic programming. Evol. Comput. 6(4), 293–309 (1998)

    Article  Google Scholar 

  66. L. Spector, J. Klein, M. Keijzer, The Push3 execution stack and the evolution of control, in GECCO ’05: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation (ACM, New York, NY, USA, 2005), pp. 1689–1696

  67. J. Stuart, E. Segal, D. Koller, S. Kim, A gene-coexpression network for global discovery of conserved genetic modules. Science 302(5643), 249 (2003)

    Article  Google Scholar 

  68. W.A. Tackett, Mining the genetic program. IEEE Expert 10(3), 28–38 (1995)

    Article  Google Scholar 

  69. L. Vanneschi, M. Tomassini, Pros and cons of fitness distance correlation in genetic programming, in GECCO 2003: Proceedings of the Bird of a Feather Workshops, Genetic and Evolutionary Computation Conference, ed. by A.M. Barry (AAAI, Chicago, 2003), pp. 284–287

    Google Scholar 

  70. D.C. Wedge, D.B. Kell, Rapid prediction of optimum population size in genetic programming using a novel genotype—fitness correlation, in GECCO ’08: Proceedings of the 10th annual conference on Genetic and evolutionary computation, ed. by M. Keijzer et al. (ACM, Atlanta, GA, USA, 2008), pp. 1315–1322

    Chapter  Google Scholar 

  71. S. Will, K. Reiche, I. Hofacker, P. Stadler, R. Backofen, Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering. PLoS Comput. Biol. 3(4), e65 (2007)

    Article  MathSciNet  Google Scholar 

  72. M.S. Withall, C.J. Hinde, R.G. Stone, An improved representation for evolving programs. Genet. Program. Evolvable Mach. 10(1), 37–70 (2009)

    Article  Google Scholar 

  73. K. Wolfson, M. Sipper, Evolving efficient list search algorithms, in Evolution Artificielle, 9th International Conference, Lecture Notes in Computer Science, ed. by P. Collet (2009)

  74. P. Wong, M. Zhang, Algebraic simplification of GP programs during evolution, in GECCO 2006: Proceedings of the 8th annual conference on Genetic and evolutionary computation, vol. 1, ed. by M. Keijzer et al. (ACM Press, Seattle, Washington, USA, 2006), pp. 927–934

    Chapter  Google Scholar 

  75. J. Woodward, Evolving Turing complete representations, in Proceedings of the 2003 Congress on Evolutionary Computation CEC2003, ed. by R. Sarker et al. (IEEE Press, Canberra, 2003), pp. 830–837

    Chapter  Google Scholar 

Download references

Acknowledgments

Kfir Wolfson and Shay Zakov were partially supported by the Frankel Center for Computer Science at Ben-Gurion University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moshe Sipper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wolfson, K., Zakov, S., Sipper, M. et al. Have your spaghetti and eat it too: evolutionary algorithmics and post-evolutionary analysis. Genet Program Evolvable Mach 12, 121–160 (2011). https://doi.org/10.1007/s10710-010-9122-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10710-010-9122-1

Keywords

Navigation