skip to main content
10.1145/3321707.3321738acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Teaching GP to program like a human software developer: using perplexity pressure to guide program synthesis approaches

Published:13 July 2019Publication History

ABSTRACT

Program synthesis is one of the relevant applications of GP with a strong impact on new fields such as genetic improvement. In order for synthesized code to be used in real-world software, the structure of the programs created by GP must be maintainable. We can teach GP how real-world software is built by learning the relevant properties of mined human-coded software - which can be easily accessed through repository hosting services such as GitHub. So combining program synthesis and repository mining is a logical step. In this paper, we analyze if GP can write programs with properties similar to code produced by human software developers. First, we compare the structure of functions generated by different GP initialization methods to a mined corpus containing real-world software. The results show that the studied GP initialization methods produce a totally different combination of programming language elements in comparison to real-world software. Second, we propose perplexity pressure and analyze how its use changes the properties of code produced by GP. The results are very promising and show that we can guide the search to the desired program structure. Thus, we recommend using perplexity pressure as it can be easily integrated in various search-based algorithms.

References

  1. Andrea Arcuri, David Robert White, John Clark, and Xin Yao. 2008. Multi-objective improvement of software using co-evolution and smart seeding. In Asia-Pacific Conference on Simulated Evolution and Learning. Springer, Berlin, Heidelberg, 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andrea Arcuri and Xin Yao. 2008. A novel co-evolutionary approach to automatic software bug fixing. In IEEE Congress on Evolutionary Computation. IEEE, 162--168.Google ScholarGoogle ScholarCross RefCross Ref
  3. Anil Bhattacharyya. 1946. On a measure of divergence between two multinomial populations. Sankhyā: the indian journal of statistics 7, 4 (1946), 401--406.Google ScholarGoogle Scholar
  4. Lenore Blum and Manuel Blum. 1975. Toward a mathematical theory of inductive inference. Information and control 28, 2 (1975), 125--155.Google ScholarGoogle Scholar
  5. Nathan Burles, Edward Bowles, Alexander EI Brownlee, Zoltan A Kocsis, Jerry Swan, and Nadarajen Veerapen. 2015. Object-oriented genetic improvement for improved energy consumption in Google Guava. In International Symposium on Search Based Software Engineering. Springer International Publishing, Cham, 255--261.Google ScholarGoogle ScholarCross RefCross Ref
  6. Michael Fenton, James McDermott, David Fagan, Stefan Forstenlechner, Erik Hemberg, and Michael O'Neill. 2017. PonyGE2: Grammatical evolution in Python. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, New York, NY, USA, 1194--1201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2017. A grammar design pattern for arbitrary program synthesis problems in genetic programming. In Genetic Programming. Springer International Publishing, Cham, 262--277.Google ScholarGoogle Scholar
  8. Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2018. Extending program synthesis grammars for grammar-guided genetic programming. In Parallel Problem Solving from Nature - PPSN XV. Springer International Publishing, Cham, 197--208.Google ScholarGoogle Scholar
  9. Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary algorithms made easy. Journal of Machine Learning Research 13, Jul (2012), 2171--2175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. William A Gale and Geoffrey Sampson. 1995. Good-turing frequency estimation without tears. Journal of quantitative linguistics 2, 3 (1995), 217--237.Google ScholarGoogle ScholarCross RefCross Ref
  11. Sumit Gulwani, Oleksandr Polozov, Rishabh Singh, et al. 2017. Program synthesis. Foundations and Trends® in Programming Languages 4, 1--2 (2017), 1--119.Google ScholarGoogle Scholar
  12. Saemundur O. Haraldsson, John R. Woodward, Alexander E. I. Brownlee, and Kristin Siggeirsdottir. 2017. Fixing bugs in your sleep: How genetic improvement became an overnight success. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, New York, NY, USA, 1513--1520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mark Harman, Yue Jia, and William B Langdon. 2014. Babel pidgin: SBSE can grow and graft entirely new functionality into a real world system. In International Symposium on Search Based Software Engineering. Springer International Publishing, Cham, 247--252.Google ScholarGoogle ScholarCross RefCross Ref
  14. Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2018. Program synthesis using uniform mutation by addition and deletion. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, New York, NY, USA, 1127--1134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Thomas Helmuth and Lee Spector. 2015. General program synthesis benchmark suite. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. ACM, New York, NY, USA, 1039--1046. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Susmit Jha, Sumit Gulwani, Sanjit A. Seshia, and Ashish Tiwari. 2010. Oracle-guided component-based program synthesis. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1. ACM, New York, NY, USA, 215--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Victoria Johansson. 2009. Lexical diversity and lexical density in speech and writing: A developmental perspective. Working Papers in Linguistics 53 (2009), 61--79.Google ScholarGoogle Scholar
  18. Dan Jurafsky and James H Martin. 2009. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, Pearson Education, Upper Saddle River, NJ, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Maarten Keijzer. 2003. Improving symbolic regression with interval arithmetic and linear scaling. In European Conference on Genetic Programming. Springer, Berlin, Heidelberg, 70--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Michael Korns. 2011. Accuracy in symbolic regression. In Genetic Programming Theory and Practice IX. Springer, New York, NY, USA, 129--151.Google ScholarGoogle Scholar
  21. John R. Koza. 1992. Genetic programming: On the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. John R. Koza. 1994. Genetic programming II: Automatic discovery of reusable programs. MIT Press, Cambridge, MA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. William B Langdon and Mark Harman. 2012. Genetically improving 50000 lines of C++. RN 12, 09 (2012), 09.Google ScholarGoogle Scholar
  24. William B Langdon and Mark Harman. 2014. Genetically improved CUDA C++ software. In European Conference on Genetic Programming. Springer, Berlin, Heidelberg, 87--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. William B. Langdon, Marc Modat, Justyna Petke, and Mark Harman. 2014. Improving 3D medical image registration CUDA software with genetic programming. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. ACM, New York, NY, USA, 951--958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. William B Langdon and Justyna Petke. 2017. Software is not fragile. In First Complex Systems Digital Campus World E-Conference 2015. Springer International Publishing, Cham, 203--211.Google ScholarGoogle Scholar
  27. William B Langdon and R Poll. 2005. Evolutionary solo pong players. In IEEE Congress on Evolutionary Computation, Vol. 3. IEEE, 2621--2628.Google ScholarGoogle Scholar
  28. Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, Piscataway, NJ, USA, 3--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Claire Le Goues, Stephanie Forrest, and Westley Weimer. 2013. Current challenges in automatic software repair. Software Quality Journal 21, 3 (2013), 421--443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Zohar Manna and Richard J. Waldinger. 1971. Toward automatic program synthesis. Commun. ACM 14, 3 (1971), 151--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering SE-2, 4 (1976), 308--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Daniel McGaughran and Mengjie Zhang. 2009. Evolving more representative programs with genetic programming. International Journal of software engineering and knowledge engineering 19, 01 (2009), 1--22.Google ScholarGoogle ScholarCross RefCross Ref
  33. Ali Danandeh Mehr, Ercan Kahya, and Cahit Yerdelen. 2014. Linear genetic programming application for successive-station monthly streamflow prediction. Computers & Geosciences 70 (2014), 63--72.Google ScholarGoogle ScholarCross RefCross Ref
  34. David J Montana. 1995. Strongly typed genetic programming. Evolutionary computation 3, 2 (1995), 199--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Michael O'Neill, Miguel Nicolau, and Alexandros Agapitos. 2014. Experiments in program synthesis with grammatical evolution: A focus on integer sorting. In IEEE Congress on Evolutionary Computation. IEEE, 1504--1511.Google ScholarGoogle ScholarCross RefCross Ref
  36. Ludo Pagie and Paulien Hogeweg. 1997. Evolutionary consequences of coevolving targets. Evolutionary computation 5, 4 (1997), 401--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Norman Paterson and Mike Livesey. 1997. Evolving caching algorithms in C by genetic programming. Genetic Programming 1997 (1997), 262--267.Google ScholarGoogle Scholar
  38. Justyna Petke. 2017. New operators for non-functional genetic improvement. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, New York, NY, USA, 1541--1542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Justyna Petke, Saemundur O Haraldsson, Mark Harman, William B Langdon, David R White, and John R Woodward. 2018. Genetic improvement of software: a comprehensive survey. IEEE Transactions on Evolutionary Computation 22, 3 (2018), 415--432.Google ScholarGoogle ScholarCross RefCross Ref
  40. Riccardo Poli, William B Langdon, Nicholas F McPhee, and John R Koza. 2008. A field guide to genetic programming. Lulu.com, Morrisville, NC, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Joseph Renzullo, Westley Weimer, Melanie Moses, and Stephanie Forrest. 2018. Neutrality and epistasis in program space. In Proceedings of the 4th International Workshop on Genetic Improvement Workshop. ACM, New York, NY, USA, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jose L Risco-Martin, J Manuel Colmenar, J Ignacio Hidalgo, Juan Lanchares, and Josefa Diaz. 2014. A methodology to automatically optimize dynamic memory managers applying grammatical evolution. Journal of Systems and Software 91 (2014), 109--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Conor Ryan, John James Collins, and Michael O Neill. 1998. Grammatical evolution: Evolving programs for an arbitrary language. In European Conference on Genetic Programming. Springer, Berlin, Heidelberg, 83--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Conor Ryan and Laur Ivan. 1999. Automatic parallelization of arbitrary programs. In Genetic Programming. Springer, Berlin, Heidelberg, 244--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Conor Ryan and Paul Walsh. 1997. The evolution of provable parallel programs. Genetic Programming 199, 7 (1997), 295--302.Google ScholarGoogle Scholar
  46. Dirk Schweim and Franz Rothlauf. 2018. An analysis of the bias of variation operators of estimation of distribution programming. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, New York, NY, USA, 1191--1198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Lee Spector and Alan Robinson. 2002. Genetic programming and autoconstructive evolution with the push programming language. Genetic Programming and Evolvable Machines 3, 1 (2002), 7--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Phillip D. Summers. 1977. A methodology for LISP program construction from examples. J. ACM 24, 1 (1977), 161--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Shin Hwei Tan, Hiroaki Yoshida, Mukul R Prasad, and Abhik Roychoudhury. 2016. Anti-patterns in search-based program repair. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, New York, NY, USA, 727--738. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Nguyen Quang Uy, Nguyen Xuan Hoai, Michael OâĂŹNeill, Robert I McKay, and Edgar Galván-López. 2011. Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genetic Programming and Evolvable Machines 12, 2 (2011), 91--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Ekaterina J Vladislavleva, Guido F Smits, and Dick Den Hertog. 2009. Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Transactions on Evolutionary Computation 13, 2 (2009), 333--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Richard J. Waldinger and Richard C. T. Lee. 1969. PROW: A step toward automatic program writing. In Proceedings of the 1st International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 241--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Paul Walsh and Conor Ryan. 1996. Paragen: A novel technique for the autoparallelisation of sequential programs using GP. In Proceedings of the 1st Annual Conference on Genetic Programming. MIT Press, Cambridge, MA, USA, 406--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In Proceedings of the 31st International Conference on Software Engineering. IEEE Computer Society, Washington, DC, USA, 364--374. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Teaching GP to program like a human software developer: using perplexity pressure to guide program synthesis approaches

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference
          July 2019
          1545 pages
          ISBN:9781450361118
          DOI:10.1145/3321707

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 July 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,669of4,410submissions,38%

          Upcoming Conference

          GECCO '24
          Genetic and Evolutionary Computation Conference
          July 14 - 18, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader