ABSTRACT
Program synthesis is one of the relevant applications of GP with a strong impact on new fields such as genetic improvement. In order for synthesized code to be used in real-world software, the structure of the programs created by GP must be maintainable. We can teach GP how real-world software is built by learning the relevant properties of mined human-coded software - which can be easily accessed through repository hosting services such as GitHub. So combining program synthesis and repository mining is a logical step. In this paper, we analyze if GP can write programs with properties similar to code produced by human software developers. First, we compare the structure of functions generated by different GP initialization methods to a mined corpus containing real-world software. The results show that the studied GP initialization methods produce a totally different combination of programming language elements in comparison to real-world software. Second, we propose perplexity pressure and analyze how its use changes the properties of code produced by GP. The results are very promising and show that we can guide the search to the desired program structure. Thus, we recommend using perplexity pressure as it can be easily integrated in various search-based algorithms.
- Andrea Arcuri, David Robert White, John Clark, and Xin Yao. 2008. Multi-objective improvement of software using co-evolution and smart seeding. In Asia-Pacific Conference on Simulated Evolution and Learning. Springer, Berlin, Heidelberg, 61--70. Google ScholarDigital Library
- Andrea Arcuri and Xin Yao. 2008. A novel co-evolutionary approach to automatic software bug fixing. In IEEE Congress on Evolutionary Computation. IEEE, 162--168.Google ScholarCross Ref
- Anil Bhattacharyya. 1946. On a measure of divergence between two multinomial populations. Sankhyā: the indian journal of statistics 7, 4 (1946), 401--406.Google Scholar
- Lenore Blum and Manuel Blum. 1975. Toward a mathematical theory of inductive inference. Information and control 28, 2 (1975), 125--155.Google Scholar
- Nathan Burles, Edward Bowles, Alexander EI Brownlee, Zoltan A Kocsis, Jerry Swan, and Nadarajen Veerapen. 2015. Object-oriented genetic improvement for improved energy consumption in Google Guava. In International Symposium on Search Based Software Engineering. Springer International Publishing, Cham, 255--261.Google ScholarCross Ref
- Michael Fenton, James McDermott, David Fagan, Stefan Forstenlechner, Erik Hemberg, and Michael O'Neill. 2017. PonyGE2: Grammatical evolution in Python. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, New York, NY, USA, 1194--1201. Google ScholarDigital Library
- Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2017. A grammar design pattern for arbitrary program synthesis problems in genetic programming. In Genetic Programming. Springer International Publishing, Cham, 262--277.Google Scholar
- Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2018. Extending program synthesis grammars for grammar-guided genetic programming. In Parallel Problem Solving from Nature - PPSN XV. Springer International Publishing, Cham, 197--208.Google Scholar
- Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary algorithms made easy. Journal of Machine Learning Research 13, Jul (2012), 2171--2175. Google ScholarDigital Library
- William A Gale and Geoffrey Sampson. 1995. Good-turing frequency estimation without tears. Journal of quantitative linguistics 2, 3 (1995), 217--237.Google ScholarCross Ref
- Sumit Gulwani, Oleksandr Polozov, Rishabh Singh, et al. 2017. Program synthesis. Foundations and Trends® in Programming Languages 4, 1--2 (2017), 1--119.Google Scholar
- Saemundur O. Haraldsson, John R. Woodward, Alexander E. I. Brownlee, and Kristin Siggeirsdottir. 2017. Fixing bugs in your sleep: How genetic improvement became an overnight success. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, New York, NY, USA, 1513--1520. Google ScholarDigital Library
- Mark Harman, Yue Jia, and William B Langdon. 2014. Babel pidgin: SBSE can grow and graft entirely new functionality into a real world system. In International Symposium on Search Based Software Engineering. Springer International Publishing, Cham, 247--252.Google ScholarCross Ref
- Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2018. Program synthesis using uniform mutation by addition and deletion. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, New York, NY, USA, 1127--1134. Google ScholarDigital Library
- Thomas Helmuth and Lee Spector. 2015. General program synthesis benchmark suite. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. ACM, New York, NY, USA, 1039--1046. Google ScholarDigital Library
- Susmit Jha, Sumit Gulwani, Sanjit A. Seshia, and Ashish Tiwari. 2010. Oracle-guided component-based program synthesis. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1. ACM, New York, NY, USA, 215--224. Google ScholarDigital Library
- Victoria Johansson. 2009. Lexical diversity and lexical density in speech and writing: A developmental perspective. Working Papers in Linguistics 53 (2009), 61--79.Google Scholar
- Dan Jurafsky and James H Martin. 2009. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, Pearson Education, Upper Saddle River, NJ, USA. Google ScholarDigital Library
- Maarten Keijzer. 2003. Improving symbolic regression with interval arithmetic and linear scaling. In European Conference on Genetic Programming. Springer, Berlin, Heidelberg, 70--82. Google ScholarDigital Library
- Michael Korns. 2011. Accuracy in symbolic regression. In Genetic Programming Theory and Practice IX. Springer, New York, NY, USA, 129--151.Google Scholar
- John R. Koza. 1992. Genetic programming: On the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA. Google ScholarDigital Library
- John R. Koza. 1994. Genetic programming II: Automatic discovery of reusable programs. MIT Press, Cambridge, MA, USA. Google ScholarDigital Library
- William B Langdon and Mark Harman. 2012. Genetically improving 50000 lines of C++. RN 12, 09 (2012), 09.Google Scholar
- William B Langdon and Mark Harman. 2014. Genetically improved CUDA C++ software. In European Conference on Genetic Programming. Springer, Berlin, Heidelberg, 87--99. Google ScholarDigital Library
- William B. Langdon, Marc Modat, Justyna Petke, and Mark Harman. 2014. Improving 3D medical image registration CUDA software with genetic programming. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. ACM, New York, NY, USA, 951--958. Google ScholarDigital Library
- William B Langdon and Justyna Petke. 2017. Software is not fragile. In First Complex Systems Digital Campus World E-Conference 2015. Springer International Publishing, Cham, 203--211.Google Scholar
- William B Langdon and R Poll. 2005. Evolutionary solo pong players. In IEEE Congress on Evolutionary Computation, Vol. 3. IEEE, 2621--2628.Google Scholar
- Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, Piscataway, NJ, USA, 3--13. Google ScholarDigital Library
- Claire Le Goues, Stephanie Forrest, and Westley Weimer. 2013. Current challenges in automatic software repair. Software Quality Journal 21, 3 (2013), 421--443. Google ScholarDigital Library
- Zohar Manna and Richard J. Waldinger. 1971. Toward automatic program synthesis. Commun. ACM 14, 3 (1971), 151--165. Google ScholarDigital Library
- Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering SE-2, 4 (1976), 308--320. Google ScholarDigital Library
- Daniel McGaughran and Mengjie Zhang. 2009. Evolving more representative programs with genetic programming. International Journal of software engineering and knowledge engineering 19, 01 (2009), 1--22.Google ScholarCross Ref
- Ali Danandeh Mehr, Ercan Kahya, and Cahit Yerdelen. 2014. Linear genetic programming application for successive-station monthly streamflow prediction. Computers & Geosciences 70 (2014), 63--72.Google ScholarCross Ref
- David J Montana. 1995. Strongly typed genetic programming. Evolutionary computation 3, 2 (1995), 199--230. Google ScholarDigital Library
- Michael O'Neill, Miguel Nicolau, and Alexandros Agapitos. 2014. Experiments in program synthesis with grammatical evolution: A focus on integer sorting. In IEEE Congress on Evolutionary Computation. IEEE, 1504--1511.Google ScholarCross Ref
- Ludo Pagie and Paulien Hogeweg. 1997. Evolutionary consequences of coevolving targets. Evolutionary computation 5, 4 (1997), 401--418. Google ScholarDigital Library
- Norman Paterson and Mike Livesey. 1997. Evolving caching algorithms in C by genetic programming. Genetic Programming 1997 (1997), 262--267.Google Scholar
- Justyna Petke. 2017. New operators for non-functional genetic improvement. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, New York, NY, USA, 1541--1542. Google ScholarDigital Library
- Justyna Petke, Saemundur O Haraldsson, Mark Harman, William B Langdon, David R White, and John R Woodward. 2018. Genetic improvement of software: a comprehensive survey. IEEE Transactions on Evolutionary Computation 22, 3 (2018), 415--432.Google ScholarCross Ref
- Riccardo Poli, William B Langdon, Nicholas F McPhee, and John R Koza. 2008. A field guide to genetic programming. Lulu.com, Morrisville, NC, USA. Google ScholarDigital Library
- Joseph Renzullo, Westley Weimer, Melanie Moses, and Stephanie Forrest. 2018. Neutrality and epistasis in program space. In Proceedings of the 4th International Workshop on Genetic Improvement Workshop. ACM, New York, NY, USA, 1--8. Google ScholarDigital Library
- Jose L Risco-Martin, J Manuel Colmenar, J Ignacio Hidalgo, Juan Lanchares, and Josefa Diaz. 2014. A methodology to automatically optimize dynamic memory managers applying grammatical evolution. Journal of Systems and Software 91 (2014), 109--123. Google ScholarDigital Library
- Conor Ryan, John James Collins, and Michael O Neill. 1998. Grammatical evolution: Evolving programs for an arbitrary language. In European Conference on Genetic Programming. Springer, Berlin, Heidelberg, 83--96. Google ScholarDigital Library
- Conor Ryan and Laur Ivan. 1999. Automatic parallelization of arbitrary programs. In Genetic Programming. Springer, Berlin, Heidelberg, 244--254. Google ScholarDigital Library
- Conor Ryan and Paul Walsh. 1997. The evolution of provable parallel programs. Genetic Programming 199, 7 (1997), 295--302.Google Scholar
- Dirk Schweim and Franz Rothlauf. 2018. An analysis of the bias of variation operators of estimation of distribution programming. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, New York, NY, USA, 1191--1198. Google ScholarDigital Library
- Lee Spector and Alan Robinson. 2002. Genetic programming and autoconstructive evolution with the push programming language. Genetic Programming and Evolvable Machines 3, 1 (2002), 7--40. Google ScholarDigital Library
- Phillip D. Summers. 1977. A methodology for LISP program construction from examples. J. ACM 24, 1 (1977), 161--175. Google ScholarDigital Library
- Shin Hwei Tan, Hiroaki Yoshida, Mukul R Prasad, and Abhik Roychoudhury. 2016. Anti-patterns in search-based program repair. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, New York, NY, USA, 727--738. Google ScholarDigital Library
- Nguyen Quang Uy, Nguyen Xuan Hoai, Michael OâĂŹNeill, Robert I McKay, and Edgar Galván-López. 2011. Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genetic Programming and Evolvable Machines 12, 2 (2011), 91--119. Google ScholarDigital Library
- Ekaterina J Vladislavleva, Guido F Smits, and Dick Den Hertog. 2009. Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Transactions on Evolutionary Computation 13, 2 (2009), 333--349. Google ScholarDigital Library
- Richard J. Waldinger and Richard C. T. Lee. 1969. PROW: A step toward automatic program writing. In Proceedings of the 1st International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 241--252. Google ScholarDigital Library
- Paul Walsh and Conor Ryan. 1996. Paragen: A novel technique for the autoparallelisation of sequential programs using GP. In Proceedings of the 1st Annual Conference on Genetic Programming. MIT Press, Cambridge, MA, USA, 406--409. Google ScholarDigital Library
- Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In Proceedings of the 31st International Conference on Software Engineering. IEEE Computer Society, Washington, DC, USA, 364--374. Google ScholarDigital Library
Index Terms
- Teaching GP to program like a human software developer: using perplexity pressure to guide program synthesis approaches
Recommendations
Using knowledge of human-generated code to bias the search in program synthesis with grammatical evolution
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference CompanionRecent studies show that program synthesis with GE produces code that has different structure compared to human-generated code, e.g., loops and conditions are hardly used. In this article, we extract knowledge from human-generated code to guide ...
HMXT-GP: an information-theoretic approach to genetic programming that maintains diversity
SAC '11: Proceedings of the 2011 ACM Symposium on Applied ComputingThis paper applies a recent information--theoretic approach to controlling Genetic Algorithms (GAs) called HMXT to tree--based Genetic Programming (GP). HMXT, in a GA domain, requires the setting of selection thresholds in a population and the ...
Crossover in Grammatical Evolution
We present an investigation into crossover in Grammatical Evolution that begins by examining a biologically-inspired homologous crossover operator that is compared to standard one and two-point operators. Results demonstrate that this homologous ...
Comments