Abstract
Grammar-Guided Genetic Programming is widely recognised as one of the most successful approaches for program synthesis, i.e., the task of automatically discovering an executable piece of code given user intent. Grammar-Guided Genetic Programming has been shown capable of successfully evolving programs in arbitrary languages that solve several program synthesis problems based only on a set of input-output examples. Despite its success, the restriction on the evolutionary system to only leverage input/output error rate during its assessment of the programs it derives limits its scalability to larger and more complex program synthesis problems. With the growing number and size of open software repositories and generative artificial intelligence approaches, there is a sizeable and growing number of approaches for retrieving/generating source code based on textual problem descriptions. Therefore, it is now, more than ever, time to introduce G3P to other means of user intent (particularly textual problem descriptions). In this paper, we would like to assess the potential for G3P to evolve programs based on their similarity to particular target codes of interest (obtained using some code retrieval/generative approach). We particularly assess 4 similarity measures from various fields: text processing (i.e., FuzzyWuzzy), natural language processing (i.e., Cosine Similarity based on term frequency), software clone detection (i.e., CCFinder), plagiarism detector(i.e., SIM). Through our experimental evaluation on a well-known program synthesis benchmark, we have shown that G3P successfully manages to evolve some of the desired programs with three of the used similarity measures. However, in its default configuration, G3P is not as successful with similarity measures as with the classical input/output error rate at evolving solving program synthesis problems.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Alexandru, C.V.: Guided code synthesis using deep neural networks. In: ACM SIGSOFT, pp. 1068–1070 (2016)
Brameier, M., Banzhaf, W., Banzhaf, W.: Linear Genetic Programming, vol. 1. Springer, New York (2007)
Byrne, J., Cardiff, P., Brabazon, A., et al.: Evolving parametric aircraft models for design exploration and optimisation. Neurocomputing 142, 39–47 (2014)
Ciritoglu, H.E., Saber, T., Buda, T.S., Murphy, J., Thorpe, C.: Towards a better replica management for hadoop distributed file system. In: IEEE BigData Congress (2018)
Cohen, A.: Fuzzywuzzy: fuzzy string matching in python (2011)
Forstenlechner, S.: Program synthesis with grammars and semantics in genetic programming. Ph. D. dissertation (2019)
Forstenlechner, S., Fagan, D., Nicolau, M., O’Neill, M.: A grammar design pattern for arbitrary program synthesis problems in genetic programming. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 262–277. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_17
Gitchell, D., Tran, N.: Sim: a utility for detecting similarity in computer programs. ACM SIGCSE Bull. 31(1), 266–270 (1999)
Hartmann, B., MacDougall, D., Brandt, J., Klemmer, S.R.: What would other programmers do: suggesting solutions to error messages. In: SIGCHI, pp. 1019–1028 (2010)
Helmuth, T., Spector, L.: Detailed problem descriptions for general program synthesis benchmark suite. University of Massachusetts Amherst (2015)
Helmuth, T., Spector, L.: General program synthesis benchmark suite. In: GECCO, pp. 1039–1046 (2015)
Holmes, R., Murphy, G.C.: Using structural context to recommend source code examples. In: ICSE, pp. 117–125 (2005)
Hu, X., Li, G., Xia, X., Lo, D., Jin, Z.: Deep code comment generation. In: IEEE/ACM ICPC, pp. 200–20010 (2018)
Jeon, J., Qiu, X., Foster, J.S., Solar-Lezama, A.: Jsketch: sketching for java. In: ESEC/FSE, pp. 934–937 (2015)
Kamiya, T., Kusumoto, S., Inoue, K.: Ccfinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. 28(7), 654–670 (2002)
Koza, J.R., et al.: Genetic Programming II, vol. 17. MIT Press, Cambridge (1994)
Loughran, R., McDermott, J., O’Neill, M.: Tonality driven piano compositions with grammatical evolution. In: IEEE CEC, pp. 2168–2175 (2015)
Lynch, D., Saber, T., Kucera, S., Claussen, H., O’Neill, M.: Evolutionary learning of link allocation algorithms for 5G heterogeneous wireless communications networks. In: GECCO, pp. 1258–1265 (2019)
Miller, J.F., Harding, S.L.: Cartesian genetic programming. In: GECCO, pp. 2701–2726 (2008)
O’Neill, M., Nicolau, M., Agapitos, A.: Experiments in program synthesis with grammatical evolution: a focus on integer sorting. In: CEC, pp. 1504–1511 (2014)
O’Neill, M., Ryan, C.: Grammatical Evolution: Evolutionary Automatic Programming in a Arbitrary Language, vol. 4 of Genetic Programming (2003)
Pantridge, E., Spector, L.: Pyshgp: pushgp in python. In: GECCO, pp. 1255–1262 (2017)
Ragkhitwetsagul, C., Krinke, J., Clark, D.: A comparison of code similarity analysers. Empir. Softw. Eng. 23(4), 2464–2519 (2018). https://doi.org/10.1007/s10664-017-9564-7
Saber, T., Brevet, D., Botterweck, G., Ventresque, A.: Is seeding a good strategy in multi-objective feature selection when feature models evolve? IST (2017)
Saber, T., Brevet, D., Botterweck, G., Ventresque, A.: MILPIBEA: algorithm for multi-objective features selection in (evolving) software product lines. In: Paquete, L., Zarges, C. (eds.) EvoCOP 2020. LNCS, vol. 12102, pp. 164–179. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43680-3_11
Saber, T., Delavernhe, F., Papadakis, M., O’Neill, M., Ventresque, A.: A hybrid algorithm for multi-objective test case selection. In: IEEE CEC (2018)
Saber, T., Fagan, D., Lynch, D., Kucera, S., Claussen, H., O’Neill, M.: A hierarchical approach to grammar-guided genetic programming: the case of scheduling in heterogeneous networks. In: Fagan, D., Martín-Vide, C., O’Neill, M., Vega-Rodríguez, M.A. (eds.) TPNC 2018. LNCS, vol. 11324, pp. 225–237. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04070-3_18
Saber, T., Fagan, D., Lynch, D., Kucera, S., Claussen, H., O’Neill, M.: Multi-level grammar genetic programming for scheduling in heterogeneous networks. In: Castelli, M., Sekanina, L., Zhang, M., Cagnoni, S., García-Sánchez, P. (eds.) EuroGP 2018. LNCS, vol. 10781, pp. 118–134. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77553-1_8
Saber, T., Fagan, D., Lynch, D., Kucera, S., Claussen, H., O’Neill, M.: Hierarchical grammar-guided genetic programming techniques for scheduling in heterogeneous networks. In: CEC (2020)
Saber, T., Fagan, D., Lynch, D., Kucera, S., Claussen, H., O’Neill, M.: A multi-level grammar approach to grammar-guided genetic programming: the case of scheduling in heterogeneous networks. Genet. Program. Evolvable Mach. 20(2), 245–283 (2019). https://doi.org/10.1007/s10710-019-09346-4
Saber, T., Wang, S.: Evolving better rerouting surrogate travel costs with grammar-guided genetic programming. In: IEEE CEC, pp. 1–8 (2020)
Tao, N., Ventresque, A., Saber, T.: Multi-objective grammar-guided genetic programming with code similarity measurement for program synthesis. In: IEEE CEC (2022)
Whigham, P.A.: Grammatical bias for evolutionary learning (1997)
Acknowledgement
Supported, in part, by Science Foundation Ireland grant 13/RC/2094\(\_\)P2.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tao, N., Ventresque, A., Saber, T. (2022). Assessing Similarity-Based Grammar-Guided Genetic Programming Approaches for Program Synthesis. In: Dorronsoro, B., Pavone, M., Nakib, A., Talbi, EG. (eds) Optimization and Learning. OLA 2022. Communications in Computer and Information Science, vol 1684. Springer, Cham. https://doi.org/10.1007/978-3-031-22039-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-22039-5_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22038-8
Online ISBN: 978-3-031-22039-5
eBook Packages: Computer ScienceComputer Science (R0)