research-article

Teaching GP to program like a human software developer: using perplexity pressure to guide program synthesis approaches

Authors:
Dominik Sobania

Johannes Gutenberg University, Mainz, Germany

Johannes Gutenberg University, Mainz, Germany
View Profile

,
Franz Rothlauf

Johannes Gutenberg University, Mainz, Germany

Johannes Gutenberg University, Mainz, Germany
View Profile

GECCO '19: Proceedings of the Genetic and Evolutionary Computation ConferenceJuly 2019Pages 1065–1074https://doi.org/10.1145/3321707.3321738

Published:13 July 2019Publication History

GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 1065–1074

ABSTRACT

Program synthesis is one of the relevant applications of GP with a strong impact on new fields such as genetic improvement. In order for synthesized code to be used in real-world software, the structure of the programs created by GP must be maintainable. We can teach GP how real-world software is built by learning the relevant properties of mined human-coded software - which can be easily accessed through repository hosting services such as GitHub. So combining program synthesis and repository mining is a logical step. In this paper, we analyze if GP can write programs with properties similar to code produced by human software developers. First, we compare the structure of functions generated by different GP initialization methods to a mined corpus containing real-world software. The results show that the studied GP initialization methods produce a totally different combination of programming language elements in comparison to real-world software. Second, we propose perplexity pressure and analyze how its use changes the properties of code produced by GP. The results are very promising and show that we can guide the search to the desired program structure. Thus, we recommend using perplexity pressure as it can be easily integrated in various search-based algorithms.

References

Andrea Arcuri, David Robert White, John Clark, and Xin Yao. 2008. Multi-objective improvement of software using co-evolution and smart seeding. In Asia-Pacific Conference on Simulated Evolution and Learning. Springer, Berlin, Heidelberg, 61--70. Google ScholarDigital Library
Andrea Arcuri and Xin Yao. 2008. A novel co-evolutionary approach to automatic software bug fixing. In IEEE Congress on Evolutionary Computation. IEEE, 162--168.Google ScholarCross Ref
Anil Bhattacharyya. 1946. On a measure of divergence between two multinomial populations. Sankhyā: the indian journal of statistics 7, 4 (1946), 401--406.Google Scholar
Lenore Blum and Manuel Blum. 1975. Toward a mathematical theory of inductive inference. Information and control 28, 2 (1975), 125--155.Google Scholar
Nathan Burles, Edward Bowles, Alexander EI Brownlee, Zoltan A Kocsis, Jerry Swan, and Nadarajen Veerapen. 2015. Object-oriented genetic improvement for improved energy consumption in Google Guava. In International Symposium on Search Based Software Engineering. Springer International Publishing, Cham, 255--261.Google ScholarCross Ref
Michael Fenton, James McDermott, David Fagan, Stefan Forstenlechner, Erik Hemberg, and Michael O'Neill. 2017. PonyGE2: Grammatical evolution in Python. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, New York, NY, USA, 1194--1201. Google ScholarDigital Library
Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2017. A grammar design pattern for arbitrary program synthesis problems in genetic programming. In Genetic Programming. Springer International Publishing, Cham, 262--277.Google Scholar
Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2018. Extending program synthesis grammars for grammar-guided genetic programming. In Parallel Problem Solving from Nature - PPSN XV. Springer International Publishing, Cham, 197--208.Google Scholar
Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary algorithms made easy. Journal of Machine Learning Research 13, Jul (2012), 2171--2175. Google ScholarDigital Library
William A Gale and Geoffrey Sampson. 1995. Good-turing frequency estimation without tears. Journal of quantitative linguistics 2, 3 (1995), 217--237.Google ScholarCross Ref
Sumit Gulwani, Oleksandr Polozov, Rishabh Singh, et al. 2017. Program synthesis. Foundations and Trends® in Programming Languages 4, 1--2 (2017), 1--119.Google Scholar
Saemundur O. Haraldsson, John R. Woodward, Alexander E. I. Brownlee, and Kristin Siggeirsdottir. 2017. Fixing bugs in your sleep: How genetic improvement became an overnight success. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, New York, NY, USA, 1513--1520. Google ScholarDigital Library
Mark Harman, Yue Jia, and William B Langdon. 2014. Babel pidgin: SBSE can grow and graft entirely new functionality into a real world system. In International Symposium on Search Based Software Engineering. Springer International Publishing, Cham, 247--252.Google ScholarCross Ref
Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2018. Program synthesis using uniform mutation by addition and deletion. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, New York, NY, USA, 1127--1134. Google ScholarDigital Library
Thomas Helmuth and Lee Spector. 2015. General program synthesis benchmark suite. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. ACM, New York, NY, USA, 1039--1046. Google ScholarDigital Library
Susmit Jha, Sumit Gulwani, Sanjit A. Seshia, and Ashish Tiwari. 2010. Oracle-guided component-based program synthesis. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1. ACM, New York, NY, USA, 215--224. Google ScholarDigital Library
Victoria Johansson. 2009. Lexical diversity and lexical density in speech and writing: A developmental perspective. Working Papers in Linguistics 53 (2009), 61--79.Google Scholar
Dan Jurafsky and James H Martin. 2009. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall, Pearson Education, Upper Saddle River, NJ, USA. Google ScholarDigital Library
Maarten Keijzer. 2003. Improving symbolic regression with interval arithmetic and linear scaling. In European Conference on Genetic Programming. Springer, Berlin, Heidelberg, 70--82. Google ScholarDigital Library
Michael Korns. 2011. Accuracy in symbolic regression. In Genetic Programming Theory and Practice IX. Springer, New York, NY, USA, 129--151.Google Scholar
John R. Koza. 1992. Genetic programming: On the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA. Google ScholarDigital Library
John R. Koza. 1994. Genetic programming II: Automatic discovery of reusable programs. MIT Press, Cambridge, MA, USA. Google ScholarDigital Library
William B Langdon and Mark Harman. 2012. Genetically improving 50000 lines of C++. RN 12, 09 (2012), 09.Google Scholar
William B Langdon and Mark Harman. 2014. Genetically improved CUDA C++ software. In European Conference on Genetic Programming. Springer, Berlin, Heidelberg, 87--99. Google ScholarDigital Library
William B. Langdon, Marc Modat, Justyna Petke, and Mark Harman. 2014. Improving 3D medical image registration CUDA software with genetic programming. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. ACM, New York, NY, USA, 951--958. Google ScholarDigital Library
William B Langdon and Justyna Petke. 2017. Software is not fragile. In First Complex Systems Digital Campus World E-Conference 2015. Springer International Publishing, Cham, 203--211.Google Scholar
William B Langdon and R Poll. 2005. Evolutionary solo pong players. In IEEE Congress on Evolutionary Computation, Vol. 3. IEEE, 2621--2628.Google Scholar
Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering. IEEE Press, Piscataway, NJ, USA, 3--13. Google ScholarDigital Library
Claire Le Goues, Stephanie Forrest, and Westley Weimer. 2013. Current challenges in automatic software repair. Software Quality Journal 21, 3 (2013), 421--443. Google ScholarDigital Library
Zohar Manna and Richard J. Waldinger. 1971. Toward automatic program synthesis. Commun. ACM 14, 3 (1971), 151--165. Google ScholarDigital Library
Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering SE-2, 4 (1976), 308--320. Google ScholarDigital Library
Daniel McGaughran and Mengjie Zhang. 2009. Evolving more representative programs with genetic programming. International Journal of software engineering and knowledge engineering 19, 01 (2009), 1--22.Google ScholarCross Ref
Ali Danandeh Mehr, Ercan Kahya, and Cahit Yerdelen. 2014. Linear genetic programming application for successive-station monthly streamflow prediction. Computers & Geosciences 70 (2014), 63--72.Google ScholarCross Ref
David J Montana. 1995. Strongly typed genetic programming. Evolutionary computation 3, 2 (1995), 199--230. Google ScholarDigital Library
Michael O'Neill, Miguel Nicolau, and Alexandros Agapitos. 2014. Experiments in program synthesis with grammatical evolution: A focus on integer sorting. In IEEE Congress on Evolutionary Computation. IEEE, 1504--1511.Google ScholarCross Ref
Ludo Pagie and Paulien Hogeweg. 1997. Evolutionary consequences of coevolving targets. Evolutionary computation 5, 4 (1997), 401--418. Google ScholarDigital Library
Norman Paterson and Mike Livesey. 1997. Evolving caching algorithms in C by genetic programming. Genetic Programming 1997 (1997), 262--267.Google Scholar
Justyna Petke. 2017. New operators for non-functional genetic improvement. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, New York, NY, USA, 1541--1542. Google ScholarDigital Library
Justyna Petke, Saemundur O Haraldsson, Mark Harman, William B Langdon, David R White, and John R Woodward. 2018. Genetic improvement of software: a comprehensive survey. IEEE Transactions on Evolutionary Computation 22, 3 (2018), 415--432.Google ScholarCross Ref
Riccardo Poli, William B Langdon, Nicholas F McPhee, and John R Koza. 2008. A field guide to genetic programming. Lulu.com, Morrisville, NC, USA. Google ScholarDigital Library
Joseph Renzullo, Westley Weimer, Melanie Moses, and Stephanie Forrest. 2018. Neutrality and epistasis in program space. In Proceedings of the 4th International Workshop on Genetic Improvement Workshop. ACM, New York, NY, USA, 1--8. Google ScholarDigital Library
Jose L Risco-Martin, J Manuel Colmenar, J Ignacio Hidalgo, Juan Lanchares, and Josefa Diaz. 2014. A methodology to automatically optimize dynamic memory managers applying grammatical evolution. Journal of Systems and Software 91 (2014), 109--123. Google ScholarDigital Library
Conor Ryan, John James Collins, and Michael O Neill. 1998. Grammatical evolution: Evolving programs for an arbitrary language. In European Conference on Genetic Programming. Springer, Berlin, Heidelberg, 83--96. Google ScholarDigital Library
Conor Ryan and Laur Ivan. 1999. Automatic parallelization of arbitrary programs. In Genetic Programming. Springer, Berlin, Heidelberg, 244--254. Google ScholarDigital Library
Conor Ryan and Paul Walsh. 1997. The evolution of provable parallel programs. Genetic Programming 199, 7 (1997), 295--302.Google Scholar
Dirk Schweim and Franz Rothlauf. 2018. An analysis of the bias of variation operators of estimation of distribution programming. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, New York, NY, USA, 1191--1198. Google ScholarDigital Library
Lee Spector and Alan Robinson. 2002. Genetic programming and autoconstructive evolution with the push programming language. Genetic Programming and Evolvable Machines 3, 1 (2002), 7--40. Google ScholarDigital Library
Phillip D. Summers. 1977. A methodology for LISP program construction from examples. J. ACM 24, 1 (1977), 161--175. Google ScholarDigital Library
Shin Hwei Tan, Hiroaki Yoshida, Mukul R Prasad, and Abhik Roychoudhury. 2016. Anti-patterns in search-based program repair. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, New York, NY, USA, 727--738. Google ScholarDigital Library
Nguyen Quang Uy, Nguyen Xuan Hoai, Michael OâĂ&Zacute;Neill, Robert I McKay, and Edgar Galván-López. 2011. Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genetic Programming and Evolvable Machines 12, 2 (2011), 91--119. Google ScholarDigital Library
Ekaterina J Vladislavleva, Guido F Smits, and Dick Den Hertog. 2009. Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Transactions on Evolutionary Computation 13, 2 (2009), 333--349. Google ScholarDigital Library
Richard J. Waldinger and Richard C. T. Lee. 1969. PROW: A step toward automatic program writing. In Proceedings of the 1st International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 241--252. Google ScholarDigital Library
Paul Walsh and Conor Ryan. 1996. Paragen: A novel technique for the autoparallelisation of sequential programs using GP. In Proceedings of the 1st Annual Conference on Genetic Programming. MIT Press, Cambridge, MA, USA, 406--409. Google ScholarDigital Library
Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In Proceedings of the 31st International Conference on Software Engineering. IEEE Computer Society, Washington, DC, USA, 364--374. Google ScholarDigital Library

Index Terms

Teaching GP to program like a human software developer: using perplexity pressure to guide program synthesis approaches
1. Software and its engineering
  1. Software creation and management
    1. Search-based software engineering
    2. Software development techniques
      1. Automatic programming
        Genetic programming
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Using knowledge of human-generated code to bias the search in program synthesis with grammatical evolution
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Recent studies show that program synthesis with GE produces code that has different structure compared to human-generated code, e.g., loops and conditions are hardly used. In this article, we extract knowledge from human-generated code to guide ...
Read More
HMXT-GP: an information-theoretic approach to genetic programming that maintains diversity
SAC '11: Proceedings of the 2011 ACM Symposium on Applied Computing

This paper applies a recent information--theoretic approach to controlling Genetic Algorithms (GAs) called HMXT to tree--based Genetic Programming (GP). HMXT, in a GA domain, requires the setting of selection thresholds in a population and the ...
Read More
Crossover in Grammatical Evolution

We present an investigation into crossover in Grammatical Evolution that begins by examining a biologically-inspired homologous crossover operator that is compared to standard one and two-point operators. Results demonstrate that this homologous ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference
July 2019
1545 pages
ISBN:9781450361118
DOI:10.1145/3321707
Editor:
Manuel López-Ibáñez
University of Manchester, UK
,
General Chairs:
Anne Auger
Inria and Ecole Polytechnique, France
,
Thomas Stützle
IRIDIA, Université libre de Bruxelles Belgium
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 July 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
genetic improvement
genetic programming
grammatical evolution
language models
mining software repositories
software synthesis
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 242
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Teaching GP to program like a human software developer: using perplexity pressure to guide program synthesis approaches

GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Using knowledge of human-generated code to bias the search in program synthesis with grammatical evolution

HMXT-GP: an information-theoretic approach to genetic programming that maintains diversity

Crossover in Grammatical Evolution