Abstract
Human code is different from code generated by program search. We investigate if properties from human-generated code can guide program search to improve the qualities of the generated programs, e.g., readability and performance. Here we focus on program search with grammatical evolution, which produces code that has different structure compared to human-generated code, e.g., loops and conditions are hardly used. We use a large code-corpus that was mined from the open software repository service GitHub and measure software metrics and properties describing the code-base. We use this knowledge to guide the search by incorporating a new selection scheme. Our new selection scheme favors programs that are structurally similar to the programs in the GitHub code-base. We find noticeable evidence that software metrics can help in guiding evolutionary search.
The authors thank Jordan Wick for sharing his expertise, the insightful discussions, and his help on our project. This work was supported by a fellowship within the IFI programme of the German Academic Exchange Service (DAAD).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
radon: https://pypi.org/project/radon/.
- 2.
astdump: https://pypi.org/project/astdump/.
References
Altenberg, L.: Open problems in the spectral analysis of evolutionary dynamics. In: Menon, A. (ed.) Frontiers of Evolutionary Computation. Genetic Algorithms and Evolutionary Computation, vol. 11, pp. 73–102. Springer, Boston (2004). https://doi.org/10.1007/1-4020-7782-3_4
Basili, V.R., Perricone, B.T.: Software errors and complexity: an empirical investigation. Commun. ACM 27(1), 42–52 (1984)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Dijkstra, E.W.: The humble programmer. Commun. ACM 15(10), 859–866 (1972)
Fenton, M., McDermott, J., Fagan, D., Forstenlechner, S., Hemberg, E., O’Neill, M.: PonyGE2: grammatical evolution in python. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 1194–1201. ACM, Berlin (2017)
Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Trans. Softw. Eng. 25(5), 675–689 (1999)
Forstenlechner, S., Fagan, D., Nicolau, M., O’Neill, M.: Towards understanding and refining the general program synthesis benchmark suite with genetic programming. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–6. IEEE (2018)
Helmuth, T., Spector, L.: General program synthesis benchmark suite. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 1039–1046. ACM, New York (2015)
Hemberg, E., Kelly, J., O’Reilly, U.M.: On domain knowledge and novelty to improve program synthesis performance with grammatical evolution. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2019, pp. 1039–1046. ACM, New York (2019)
Hemberg, E., Veeramachaneni, K., McDermott, J., Berzan, C., O’Reilly, U.M.: An investigation of local patterns for estimation of distribution genetic programming. In: Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation (GECCO 2012), pp. 767–774. ACM, New York (2012)
Johansson, V.: Lexical diversity and lexical density in speech and writing: a developmental perspective. In: Working Papers in Linguistics, vol. 53, pp. 61–79 (2009)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Krawiec, K.: Behavioral Program Synthesis with Genetic Programming. Studies in Computational Intelligence, vol. 618. Springer, Cham (2016)
McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. SE–2(4), 308–320 (1976)
Nicolau, M.: Understanding grammatical evolution: initialisation. Genet. Program. Evolvable Mach. 18(4), 467–507 (2017). https://doi.org/10.1007/s10710-017-9309-9D
Petke, J.: New operators for non-functional genetic improvement. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 1541–1542. ACM, New York (2017)
Petke, J., Harman, M., Langdon, W.B., Weimer, W.: Using genetic improvement and code transplants to specialise a C++ Program to a Problem class. In: Nicolau, M., et al. (eds.) EuroGP 2014. LNCS, vol. 8599, pp. 137–149. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44303-3_12
Ryan, C., Collins, J.J., Neill, M.O.: Grammatical evolution: evolving programs for an arbitrary language. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 83–96. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0055930
Schweim, D., Wittenberg, D., Rothlauf, F.: On sampling error in genetic programming. Nat. Comput. (2021). https://doi.org/10.1007/s11047-020-09828-w
Selby, R.W., Basili, V.R.: Analyzing error-prone system structure. IEEE Trans. Softw. Eng. 17(2), 141–152 (1991)
Sobania, D.: On the generalizability of programs synthesized by grammar-guided genetic programming. In: Hu, T., Lourenço, N., Medvet, E. (eds.) EuroGP 2021. LNCS, vol. 12691, pp. 130–145. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72812-0_9
Sobania, D., Rothlauf, F.: Teaching GP to program like a human software developer: using perplexity pressure to guide program synthesis approaches. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2019), pp. 1065–1074. ACM, New York (2019)
Sobania, D., Rothlauf, F.: Challenges of program synthesis with grammatical evolution. In: Hu, T., Lourenço, N., Medvet, E., Divina, F. (eds.) EuroGP 2020. LNCS, vol. 12101, pp. 211–227. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44094-7_14
Spector, L.: Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In: Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO 2012, pp. 401–408. Association for Computing Machinery, New York (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Schweim, D., Hemberg, E., Sobania, D., O’Reilly, UM. (2022). Exploiting Knowledge from Code to Guide Program Search. In: Medvet, E., Pappa, G., Xue, B. (eds) Genetic Programming. EuroGP 2022. Lecture Notes in Computer Science, vol 13223. Springer, Cham. https://doi.org/10.1007/978-3-031-02056-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-02056-8_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-02055-1
Online ISBN: 978-3-031-02056-8
eBook Packages: Computer ScienceComputer Science (R0)