Enhancing Large Language Models-Based Code Generation by Leveraging Genetic Improvement

Pinna, Giovanni; Ravalico, Damiano; Rovito, Luigi; Manzoni, Luca; De Lorenzo, Andrea

doi:10.1007/978-3-031-56957-9_7

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14631))

Included in the following conference series:

European Conference on Genetic Programming (Part of EvoStar)

168 Accesses

Abstract

In recent years, the rapid advances in neural networks for Natural Language Processing (NLP) have led to the development of Large Language Models (LLMs), able to substantially improve the state-of-the-art in many NLP tasks, such as question answering and text summarization. Among them, one particularly interesting application is automatic code generation based only on the problem description. However, it has been shown that even the most effective LLMs available often fail to produce correct code. To address this issue, we propose an evolutionary-based approach using Genetic Improvement (GI) to improve the code generated by an LLM using a collection of user-provided test cases. Specifically, we employ Grammatical Evolution (GE) using a grammar that we automatically specialize—starting from a general one—for the output of the LLM. We test 25 different problems and 5 different LLMs, showing that the proposed method is able to improve in a statistically significant way the code generated by LLMs. This is a first step in showing that the combination of LLMs and evolutionary techniques can be a fruitful avenue of research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The PSB2 paper also includes an external hyperlink to a file that contains the same problems but with different descriptions (e.g., FB description in the external table details that the output is printed, while the description in the paper itself details that the output is returned, which is more coherent with the original purpose of PSB2).
2.
The BW problem has an initial maximum depth of 25 and maximum depth of 40 since the initial solution requires a depth greater than 15.
3.
The number of repetitions is constrained by the limitations of our available budget.
4.
https://github.com/dravalico/LLMGIpy.

References

An, G., Blot, A., Petke, J., Yoo, S.: PyGGI 2.0: language independent genetic improvement framework. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1100–1104 (2019)
Google Scholar
Austin, J., et al.: Program synthesis with large language models. arXiv preprint arXiv:2108.07732 (2021)
Bahrini, A., et al.: ChatGPT: applications, opportunities, and threats. In: 2023 Systems and Information Engineering Design Symposium (SIEDS), pp. 274–279 (2023)
Google Scholar
Bibel, W.: Syntax-directed, semantics-supported program synthesis. Artif. Intell. 14(3), 243–261 (1980)
Article MathSciNet Google Scholar
Blot, A., Petke, J.: MAGPIE: machine automated general performance improvement via evolution of software. arXiv preprint arXiv:2208.02811 (2022)
Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)
Budinsky, F.J., Finnie, M.A., Vlissides, J.M., Yu, P.S.: Automatic code generation from design patterns. IBM Syst. J. 35(2), 151–171 (1996)
Article Google Scholar
Chen, M., et al.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)
Chen, T., et al.: \(\{\)TVM\(\}\): an automated \(\{\)End-to-End\(\}\) optimizing compiler for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 578–594 (2018)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2019)
Fenton, M., McDermott, J., Fagan, D., Forstenlechner, S., Hemberg, E., O’Neill, M.: PonyGE2: grammatical evolution in Python. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 1194–1201 (2017)
Google Scholar
Fernando, C., Banarse, D., Michalewski, H., Osindero, S., Rocktäschel, T.: Promptbreeder: self-referential self-improvement via prompt evolution. arXiv preprint arXiv:2309.16797 (2023)
Grootendorst, M.: KeyBERT: minimal keyword extraction with BERT. Zenodo (2020)
Google Scholar
Gulwani, S., Polozov, O., Singh, R., et al.: Program synthesis. Found. Trends® Program. Lang. 4(1–2), 1–119 (2017)
Google Scholar
Guo, Q., et al.: Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. arXiv preprint arXiv:2309.08532 (2023)
Helmuth, T., Kelly, P.: PSB2: the second program synthesis benchmark suite. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 785–794 (2021)
Google Scholar
Helmuth, T., Kelly, P.: Applying genetic programming to PSB2: the next generation program synthesis benchmark suite. Genet. Program Evolvable Mach. 23(3), 375–404 (2022)
Article Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 65–70 (1979)
Google Scholar
Karpuzcu, U.R.: Automatic Verilog code generation through grammatical evolution. In: Proceedings of the 7th Annual Workshop on Genetic and Evolutionary Computation, pp. 394–397 (2005)
Google Scholar
Koza, J.R.: Genetic programming as a means for programming computers by natural selection. Stat. Comput. 4, 87–112 (1994)
Article Google Scholar
Kruskal, W.H., Wallis, W.A.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47(260), 583–621 (1952)
Article Google Scholar
Langdon, W.B.: Genetic improvement of programs. In: 2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 14–19. IEEE (2014)
Google Scholar
Liu, J., Xia, C.S., Wang, Y., Zhang, L.: Is your code generated by ChatGPT really correct? rigorous evaluation of large language models for code generation. arXiv preprint arXiv:2305.01210 (2023)
Liu, Z., Tang, Y., Luo, X., Zhou, Y., Zhang, L.F.: No need to lift a finger anymore? Assessing the quality of code generation by ChatGPT. arXiv preprint arXiv:2308.04838 (2023)
Liu, Z., Dou, Y., Jiang, J., Xu, J.: Automatic code generation of convolutional neural networks in FPGA implementation. In: 2016 International Conference on Field-Programmable Technology (FPT), pp. 61–68. IEEE (2016)
Google Scholar
Liventsev, V., Grishina, A., Härmä, A., Moonen, L.: Fully autonomous programming with large language models. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2023, pp. 1146–1155. Association for Computing Machinery, New York (2023)
Google Scholar
Löppenberg, M., Schwung, A.: Self optimisation and automatic code generation by evolutionary algorithms in PLC based controlling processes. arXiv preprint arXiv:2304.05638 (2023)
Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18(1), 50–60 (1947)
Article MathSciNet Google Scholar
Manna, Z., Waldinger, R.: Knowledge and reasoning in program synthesis. Artif. Intell. 6(2), 175–208 (1975)
Article MathSciNet Google Scholar
Manna, Z., Waldinger, R.J.: Toward automatic program synthesis. Commun. ACM 14(3), 151–165 (1971)
Article Google Scholar
Marino, F., Squillero, G., Tonda, A.: A general-purpose framework for genetic improvement. In: Handl, J., Hart, E., Lewis, P.R., López-Ibáñez, M., Ochoa, G., Paechter, B. (eds.) PPSN 2016. LNCS, vol. 9921, pp. 345–352. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45823-6_32
Chapter Google Scholar
Menabrea, L.F.: Sketch of the analytical engine invented by Charles Babbage, ESQ. In: Ada’s Legacy: Cultures of Computing from the Victorian to the Digital Age (1843)
Google Scholar
Méry, D., Singh, N.K.: Automatic code generation from Event-B models. In: Proceedings of the 2nd Symposium on Information and Communication Technology, pp. 179–188 (2011)
Google Scholar
Miller, J.F., Harding, S.L.: Cartesian genetic programming. In: Proceedings of the 10th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 2701–2726 (2008)
Google Scholar
Moreira, T.G., Wehrmeister, M.A., Pereira, C.E., Petin, J.F., Levrat, E.: Automatic code generation for embedded systems: from UML specifications to VHDL code. In: 2010 8th IEEE International Conference on Industrial Informatics, pp. 1085–1090. IEEE (2010)
Google Scholar
O’Neill, M., Ryan, C.: Grammatical evolution. IEEE Trans. Evol. Comput. 5(4), 349–358 (2001)
Article Google Scholar
OpenAI: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
Ouyang, S., Zhang, J.M., Harman, M., Wang, M.: LLM is like a box of chocolates: the non-determinism of ChatGPT in code generation. arXiv preprint arXiv:2308.02828 (2023)
Paolone, G., Marinelli, M., Paesani, R., Di Felice, P.: Automatic code generation of MVC web applications. Computers 9(3), 56 (2020)
Article Google Scholar
Petke, J., Harman, M., Langdon, W.B., Weimer, W.: Using genetic improvement and code transplants to specialise a C++ program to a problem class. In: Nicolau, M., et al. (eds.) EuroGP 2014. LNCS, vol. 8599, pp. 137–149. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44303-3_12
Chapter Google Scholar
Pluhacek, M., Kazikova, A., Kadavy, T., Viktorin, A., Senkerik, R.: Leveraging large language models for the generation of novel metaheuristic optimization algorithms. In: Proceedings of the Companion Conference on Genetic and Evolutionary Computation, pp. 1812–1820 (2023)
Google Scholar
Rugina, A.E., Thomas, D., Olive, X., Veran, G.: Gene-auto: automatic software code generation for real-time embedded systems. DASIA 2008-Data Syst. Aerosp. 665, 28 (2008)
Google Scholar
Ryan, C., Collins, J.J., Neill, M.O.: Grammatical evolution: evolving programs for an arbitrary language. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 83–96. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0055930
Chapter Google Scholar
Sandnes, F.E., Megson, G.M.: A hybrid genetic algorithm applied to automatic parallel controller code generation. In: Proceedings of the Eighth Euromicro Workshop on Real-Time Systems, pp. 70–75. IEEE (1996)
Google Scholar
Serruto, W.F., Casas, L.A.: Automatic code generation for microcontroller-based system using multi-objective linear genetic programming. In: 2017 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 279–285. IEEE (2017)
Google Scholar
Sobania, D., Briesch, M., Rothlauf, F.: Choose your programming copilot: a comparison of the program synthesis performance of github copilot and genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1019–1027 (2022)
Google Scholar
Sun, H., Nie, Y., Li, X., Huang, M., Tian, J., Kong, W.: An automatic code generation method based on sequence generative adversarial network. In: 2022 7th IEEE International Conference on Data Science in Cyberspace (DSC), pp. 383–390. IEEE (2022)
Google Scholar
Taori, R., et al.: Alpaca: a strong, replicable instruction-following model. Stanford Center for Research on Foundation Models 3(6), 7 (2023). https://crfm.stanford.edu/2023/03/13/alpaca.html
Touvron, H., et al.: LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
Touvron, H., et al.: LLaMA 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)
Vaithilingam, P., Zhang, T., Glassman, E.L.: Expectation vs experience: evaluating the usability of code generation tools powered by large language models. In: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems, CHI EA 2022. Association for Computing Machinery, New York (2022)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Walker, J.A., Liu, Y., Tempesti, G., Tyrrell, A.M.: Automatic code generation on a MOVE processor using cartesian genetic programming. In: Tempesti, G., Tyrrell, A.M., Miller, J.F. (eds.) ICES 2010. LNCS, vol. 6274, pp. 238–249. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15323-5_21
Chapter Google Scholar
Wang, Y., et al.: Self-instruct: aligning language models with self-generated instructions. arXiv preprint arXiv:2212.10560 (2023)
Ward, M.: Proving program refinements and transformations. Ph.D. thesis, University of Oxford (1989)
Google Scholar
Zhang, Y., Li, Y., Wang, X.: An optimized hybrid evolutionary algorithm for accelerating automatic code optimization. In: Third International Seminar on Artificial Intelligence, Networking, and Information Technology (AINIT 2022), vol. 12587, pp. 488–496. SPIE (2023)
Google Scholar
Zheng, L., et al.: Ansor: generating \(\{\)High-Performance\(\}\) tensor programs for deep learning. In: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2020), pp. 863–879 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Trieste, 34127, Trieste, TS, Italy
Giovanni Pinna, Damiano Ravalico, Luigi Rovito, Luca Manzoni & Andrea De Lorenzo

Authors

Giovanni Pinna
View author publications
You can also search for this author in PubMed Google Scholar
Damiano Ravalico
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Rovito
View author publications
You can also search for this author in PubMed Google Scholar
Luca Manzoni
View author publications
You can also search for this author in PubMed Google Scholar
Andrea De Lorenzo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luigi Rovito .

Editor information

Editors and Affiliations

University of Torino, Grugliasco, Italy
Mario Giacobini
Victoria University of Wellington, Wellington, New Zealand
Bing Xue
University of Trieste, Trieste, Italy
Luca Manzoni

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pinna, G., Ravalico, D., Rovito, L., Manzoni, L., De Lorenzo, A. (2024). Enhancing Large Language Models-Based Code Generation by Leveraging Genetic Improvement. In: Giacobini, M., Xue, B., Manzoni, L. (eds) Genetic Programming. EuroGP 2024. Lecture Notes in Computer Science, vol 14631. Springer, Cham. https://doi.org/10.1007/978-3-031-56957-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-56957-9_7
Published: 28 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56956-2
Online ISBN: 978-3-031-56957-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Large Language Models-Based Code Generation by Leveraging Genetic Improvement