Skip to main content

Enhancing Large Language Models-Based Code Generation by Leveraging Genetic Improvement

  • Conference paper
  • First Online:
Genetic Programming (EuroGP 2024)

Abstract

In recent years, the rapid advances in neural networks for Natural Language Processing (NLP) have led to the development of Large Language Models (LLMs), able to substantially improve the state-of-the-art in many NLP tasks, such as question answering and text summarization. Among them, one particularly interesting application is automatic code generation based only on the problem description. However, it has been shown that even the most effective LLMs available often fail to produce correct code. To address this issue, we propose an evolutionary-based approach using Genetic Improvement (GI) to improve the code generated by an LLM using a collection of user-provided test cases. Specifically, we employ Grammatical Evolution (GE) using a grammar that we automatically specialize—starting from a general one—for the output of the LLM. We test 25 different problems and 5 different LLMs, showing that the proposed method is able to improve in a statistically significant way the code generated by LLMs. This is a first step in showing that the combination of LLMs and evolutionary techniques can be a fruitful avenue of research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The PSB2 paper also includes an external hyperlink to a file that contains the same problems but with different descriptions (e.g., FB description in the external table details that the output is printed, while the description in the paper itself details that the output is returned, which is more coherent with the original purpose of PSB2).

  2. 2.

    The BW problem has an initial maximum depth of 25 and maximum depth of 40 since the initial solution requires a depth greater than 15.

  3. 3.

    The number of repetitions is constrained by the limitations of our available budget.

  4. 4.

    https://github.com/dravalico/LLMGIpy.

References

  1. An, G., Blot, A., Petke, J., Yoo, S.: PyGGI 2.0: language independent genetic improvement framework. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1100–1104 (2019)

    Google Scholar 

  2. Austin, J., et al.: Program synthesis with large language models. arXiv preprint arXiv:2108.07732 (2021)

  3. Bahrini, A., et al.: ChatGPT: applications, opportunities, and threats. In: 2023 Systems and Information Engineering Design Symposium (SIEDS), pp. 274–279 (2023)

    Google Scholar 

  4. Bibel, W.: Syntax-directed, semantics-supported program synthesis. Artif. Intell. 14(3), 243–261 (1980)

    Article  MathSciNet  Google Scholar 

  5. Blot, A., Petke, J.: MAGPIE: machine automated general performance improvement via evolution of software. arXiv preprint arXiv:2208.02811 (2022)

  6. Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)

  7. Budinsky, F.J., Finnie, M.A., Vlissides, J.M., Yu, P.S.: Automatic code generation from design patterns. IBM Syst. J. 35(2), 151–171 (1996)

    Article  Google Scholar 

  8. Chen, M., et al.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)

  9. Chen, T., et al.: \(\{\)TVM\(\}\): an automated \(\{\)End-to-End\(\}\) optimizing compiler for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 578–594 (2018)

    Google Scholar 

  10. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2019)

  11. Fenton, M., McDermott, J., Fagan, D., Forstenlechner, S., Hemberg, E., O’Neill, M.: PonyGE2: grammatical evolution in Python. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 1194–1201 (2017)

    Google Scholar 

  12. Fernando, C., Banarse, D., Michalewski, H., Osindero, S., Rocktäschel, T.: Promptbreeder: self-referential self-improvement via prompt evolution. arXiv preprint arXiv:2309.16797 (2023)

  13. Grootendorst, M.: KeyBERT: minimal keyword extraction with BERT. Zenodo (2020)

    Google Scholar 

  14. Gulwani, S., Polozov, O., Singh, R., et al.: Program synthesis. Found. Trends® Program. Lang. 4(1–2), 1–119 (2017)

    Google Scholar 

  15. Guo, Q., et al.: Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. arXiv preprint arXiv:2309.08532 (2023)

  16. Helmuth, T., Kelly, P.: PSB2: the second program synthesis benchmark suite. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 785–794 (2021)

    Google Scholar 

  17. Helmuth, T., Kelly, P.: Applying genetic programming to PSB2: the next generation program synthesis benchmark suite. Genet. Program Evolvable Mach. 23(3), 375–404 (2022)

    Article  Google Scholar 

  18. Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 65–70 (1979)

    Google Scholar 

  19. Karpuzcu, U.R.: Automatic Verilog code generation through grammatical evolution. In: Proceedings of the 7th Annual Workshop on Genetic and Evolutionary Computation, pp. 394–397 (2005)

    Google Scholar 

  20. Koza, J.R.: Genetic programming as a means for programming computers by natural selection. Stat. Comput. 4, 87–112 (1994)

    Article  Google Scholar 

  21. Kruskal, W.H., Wallis, W.A.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47(260), 583–621 (1952)

    Article  Google Scholar 

  22. Langdon, W.B.: Genetic improvement of programs. In: 2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 14–19. IEEE (2014)

    Google Scholar 

  23. Liu, J., Xia, C.S., Wang, Y., Zhang, L.: Is your code generated by ChatGPT really correct? rigorous evaluation of large language models for code generation. arXiv preprint arXiv:2305.01210 (2023)

  24. Liu, Z., Tang, Y., Luo, X., Zhou, Y., Zhang, L.F.: No need to lift a finger anymore? Assessing the quality of code generation by ChatGPT. arXiv preprint arXiv:2308.04838 (2023)

  25. Liu, Z., Dou, Y., Jiang, J., Xu, J.: Automatic code generation of convolutional neural networks in FPGA implementation. In: 2016 International Conference on Field-Programmable Technology (FPT), pp. 61–68. IEEE (2016)

    Google Scholar 

  26. Liventsev, V., Grishina, A., Härmä, A., Moonen, L.: Fully autonomous programming with large language models. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2023, pp. 1146–1155. Association for Computing Machinery, New York (2023)

    Google Scholar 

  27. Löppenberg, M., Schwung, A.: Self optimisation and automatic code generation by evolutionary algorithms in PLC based controlling processes. arXiv preprint arXiv:2304.05638 (2023)

  28. Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18(1), 50–60 (1947)

    Article  MathSciNet  Google Scholar 

  29. Manna, Z., Waldinger, R.: Knowledge and reasoning in program synthesis. Artif. Intell. 6(2), 175–208 (1975)

    Article  MathSciNet  Google Scholar 

  30. Manna, Z., Waldinger, R.J.: Toward automatic program synthesis. Commun. ACM 14(3), 151–165 (1971)

    Article  Google Scholar 

  31. Marino, F., Squillero, G., Tonda, A.: A general-purpose framework for genetic improvement. In: Handl, J., Hart, E., Lewis, P.R., López-Ibáñez, M., Ochoa, G., Paechter, B. (eds.) PPSN 2016. LNCS, vol. 9921, pp. 345–352. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45823-6_32

    Chapter  Google Scholar 

  32. Menabrea, L.F.: Sketch of the analytical engine invented by Charles Babbage, ESQ. In: Ada’s Legacy: Cultures of Computing from the Victorian to the Digital Age (1843)

    Google Scholar 

  33. Méry, D., Singh, N.K.: Automatic code generation from Event-B models. In: Proceedings of the 2nd Symposium on Information and Communication Technology, pp. 179–188 (2011)

    Google Scholar 

  34. Miller, J.F., Harding, S.L.: Cartesian genetic programming. In: Proceedings of the 10th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 2701–2726 (2008)

    Google Scholar 

  35. Moreira, T.G., Wehrmeister, M.A., Pereira, C.E., Petin, J.F., Levrat, E.: Automatic code generation for embedded systems: from UML specifications to VHDL code. In: 2010 8th IEEE International Conference on Industrial Informatics, pp. 1085–1090. IEEE (2010)

    Google Scholar 

  36. O’Neill, M., Ryan, C.: Grammatical evolution. IEEE Trans. Evol. Comput. 5(4), 349–358 (2001)

    Article  Google Scholar 

  37. OpenAI: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

  38. Ouyang, S., Zhang, J.M., Harman, M., Wang, M.: LLM is like a box of chocolates: the non-determinism of ChatGPT in code generation. arXiv preprint arXiv:2308.02828 (2023)

  39. Paolone, G., Marinelli, M., Paesani, R., Di Felice, P.: Automatic code generation of MVC web applications. Computers 9(3), 56 (2020)

    Article  Google Scholar 

  40. Petke, J., Harman, M., Langdon, W.B., Weimer, W.: Using genetic improvement and code transplants to specialise a C++ program to a problem class. In: Nicolau, M., et al. (eds.) EuroGP 2014. LNCS, vol. 8599, pp. 137–149. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44303-3_12

    Chapter  Google Scholar 

  41. Pluhacek, M., Kazikova, A., Kadavy, T., Viktorin, A., Senkerik, R.: Leveraging large language models for the generation of novel metaheuristic optimization algorithms. In: Proceedings of the Companion Conference on Genetic and Evolutionary Computation, pp. 1812–1820 (2023)

    Google Scholar 

  42. Rugina, A.E., Thomas, D., Olive, X., Veran, G.: Gene-auto: automatic software code generation for real-time embedded systems. DASIA 2008-Data Syst. Aerosp. 665, 28 (2008)

    Google Scholar 

  43. Ryan, C., Collins, J.J., Neill, M.O.: Grammatical evolution: evolving programs for an arbitrary language. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 83–96. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0055930

    Chapter  Google Scholar 

  44. Sandnes, F.E., Megson, G.M.: A hybrid genetic algorithm applied to automatic parallel controller code generation. In: Proceedings of the Eighth Euromicro Workshop on Real-Time Systems, pp. 70–75. IEEE (1996)

    Google Scholar 

  45. Serruto, W.F., Casas, L.A.: Automatic code generation for microcontroller-based system using multi-objective linear genetic programming. In: 2017 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 279–285. IEEE (2017)

    Google Scholar 

  46. Sobania, D., Briesch, M., Rothlauf, F.: Choose your programming copilot: a comparison of the program synthesis performance of github copilot and genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1019–1027 (2022)

    Google Scholar 

  47. Sun, H., Nie, Y., Li, X., Huang, M., Tian, J., Kong, W.: An automatic code generation method based on sequence generative adversarial network. In: 2022 7th IEEE International Conference on Data Science in Cyberspace (DSC), pp. 383–390. IEEE (2022)

    Google Scholar 

  48. Taori, R., et al.: Alpaca: a strong, replicable instruction-following model. Stanford Center for Research on Foundation Models 3(6), 7 (2023). https://crfm.stanford.edu/2023/03/13/alpaca.html

  49. Touvron, H., et al.: LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)

  50. Touvron, H., et al.: LLaMA 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)

  51. Vaithilingam, P., Zhang, T., Glassman, E.L.: Expectation vs experience: evaluating the usability of code generation tools powered by large language models. In: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems, CHI EA 2022. Association for Computing Machinery, New York (2022)

    Google Scholar 

  52. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  53. Walker, J.A., Liu, Y., Tempesti, G., Tyrrell, A.M.: Automatic code generation on a MOVE processor using cartesian genetic programming. In: Tempesti, G., Tyrrell, A.M., Miller, J.F. (eds.) ICES 2010. LNCS, vol. 6274, pp. 238–249. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15323-5_21

    Chapter  Google Scholar 

  54. Wang, Y., et al.: Self-instruct: aligning language models with self-generated instructions. arXiv preprint arXiv:2212.10560 (2023)

  55. Ward, M.: Proving program refinements and transformations. Ph.D. thesis, University of Oxford (1989)

    Google Scholar 

  56. Zhang, Y., Li, Y., Wang, X.: An optimized hybrid evolutionary algorithm for accelerating automatic code optimization. In: Third International Seminar on Artificial Intelligence, Networking, and Information Technology (AINIT 2022), vol. 12587, pp. 488–496. SPIE (2023)

    Google Scholar 

  57. Zheng, L., et al.: Ansor: generating \(\{\)High-Performance\(\}\) tensor programs for deep learning. In: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2020), pp. 863–879 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luigi Rovito .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pinna, G., Ravalico, D., Rovito, L., Manzoni, L., De Lorenzo, A. (2024). Enhancing Large Language Models-Based Code Generation by Leveraging Genetic Improvement. In: Giacobini, M., Xue, B., Manzoni, L. (eds) Genetic Programming. EuroGP 2024. Lecture Notes in Computer Science, vol 14631. Springer, Cham. https://doi.org/10.1007/978-3-031-56957-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56957-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56956-2

  • Online ISBN: 978-3-031-56957-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics