skip to main content
research-article
Open Access

TacTok: semantics-aware proof synthesis

Published:13 November 2020Publication History
Skip Abstract Section

Abstract

Formally verifying software correctness is a highly manual process. However, because verification proof scripts often share structure, it is possible to learn from existing proof scripts to fully automate some formal verification. The goal of this paper is to improve proof script synthesis and enable fully automating more verification. Interactive theorem provers, such as the Coq proof assistant, allow programmers to write partial proof scripts, observe the semantics of the proof state thus far, and then attempt more progress. Knowing the proof state semantics is a significant aid. Recent research has shown that the proof state can help predict the next step. In this paper, we present TacTok, the first technique that attempts to fully automate proof script synthesis by modeling proof scripts using both the partial proof script written thus far and the semantics of the proof state. Thus, TacTok more completely models the information the programmer has access to when writing proof scripts manually. We evaluate TacTok on a benchmark of 26 software projects in Coq, consisting of over 10 thousand theorems. We compare our approach to five tools. Two prior techniques, CoqHammer, the state-of-the-art proof synthesis technique, and ASTactic, a proof script synthesis technique that models proof state. And three new proof script synthesis technique we create ourselves, SeqOnly, which models only the partial proof script and the initial theorem being proven, and WeightedRandom and WeightedGreedy, which use metaheuristic search biased by frequencies of proof tactics in existing, successful proof scripts. We find that TacTok outperforms WeightedRandom and WeightedGreedy, and is complementary to CoqHammer and ASTactic: for 24 out of the 26 projects, TacTok can synthesize proof scripts for some theorems the prior tools cannot. Together with TacTok, 11.5% more theorems can be proven automatically than by CoqHammer alone, and 20.0% than by ASTactic alone. Compared to a combination of CoqHammer and ASTactic, TacTok can prove an additional 3.6% more theorems, proving 115 theorems no tool could previously prove. Overall, our experiments provide evidence that partial proof script and proof state semantics, together, provide useful information for proof script modeling, and that metaheuristic search is a promising direction for proof script synthesis. TacTok is open-source and we make public all our data and a replication package of our experiments.

Skip Supplemental Material Section

Supplemental Material

oopsla20main-p664-p-video.mp4

mp4

217.8 MB

References

  1. Tony Abou-Assaleh, Nick Cercone, Vlado Keselj, and Ray Sweidan. 2004. N-gram-based detection of new malicious code. In Proceedings of the 28th Annual International IEEE Computer Software and Applications Conference, Vol. 2. 41-42. https://doi.org/10.1109/CMPSAC. 2004.1342667 Google ScholarGoogle ScholarCross RefCross Ref
  2. Afsoon Afzal, Manish Motwani, Kathryn T. Stolee, Yuriy Brun, and Claire Le Goues. 2020. SOSRepair: Expressive Semantic Search for Real-World Program Repair. IEEE Transactions on Software Engineering (TSE) ( 2020 ). https://doi.org/10.1109/ TSE. 2019.2944914 Google ScholarGoogle ScholarCross RefCross Ref
  3. Jesse Alama, Tom Heskes, Daniel Kühlwein, Evgeni Tsivtsivadze, and Josef Urban. 2014. Premise selection for mathematics by corpus analysis and kernel methods. Journal of Automated Reasoning 52, 2 ( 2014 ), 191-213. https://doi.org/10.1007/s10817-013-9286-5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Enrique Alba and Francisco Chicano. 2007. Finding safety errors with ACO. In Conference on Genetic and Evolutionary Computation (GECCO). London, England, UK, 1066-1073. https://doi.org/10.1145/1276958.1277171 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Peter B Andrews and Chad E Brown. 2006. TPS: A hybrid automatic-interactive system for developing proofs. Journal of Applied Logic 4, 4 ( 2006 ), 367-395. https://doi.org/10.1016/j.jal. 2005. 10.002 Google ScholarGoogle ScholarCross RefCross Ref
  6. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the International Conference on Learning Representations (ICLR). San Diego, CA, USA. https://arxiv.org/abs/1409.0473Google ScholarGoogle Scholar
  7. Ahilton Barreto, Márcio Barros, and Cláudia Werner. 2008. Stafing a software project: A constraint satisfaction approach. Computers and Operations Research 35, 10 ( 2008 ), 3073-3089. https://doi.org/10.1016/j.cor. 2007. 01.010 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanović, Tim King, Andrew Reynolds, and Cesare Tinelli. 2011. CVC4. In International Conference on Computer Aided Verification (CAV), Vol. 6806. Springer, Snowbird, UT, USA, 171-177. https://doi.org/10.1007/978-3-642-22110-1_14 Google ScholarGoogle ScholarCross RefCross Ref
  9. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of machine learning research 3, Feb ( 2003 ), 1137-1155. https://dl.acm.org/doi/10.5555/944919.944966 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM 53, 2 ( 2010 ), 66-75. https://doi.org/10.1145/1646353.1646374 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jasmin Christian Blanchette, Lukas Bulwahn, and Tobias Nipkow. 2011. Automatic proof and disproof in Isabelle/HOL. In International Symposium on Frontiers of Combining Systems. Springer, 12-27. https://doi.org/10.1007/978-3-642-24364-6_2 Google ScholarGoogle ScholarCross RefCross Ref
  12. Jasmin Christian Blanchette, Cezary Kaliszyk, Lawrence C Paulson, and Josef Urban. 2016. Hammering towards QED. Journal of Formalized Reasoning 9, 1 ( 2016 ), 101-148. https://doi.org/10.6092/issn.1972-5787 / 4593 Google ScholarGoogle ScholarCross RefCross Ref
  13. Eric Brill and Robert C Moore. 2000. An improved error model for noisy channel spelling correction. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. Hong Kong, 286-293. https://doi.org/10.3115/1075218. 1075255 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Alan Bundy. 1998. A science of reasoning. In International Conference on Automated Reasoning with Analytic Tableaux and Related Methods. Springer, 10-17. https://doi.org/10.1007/3-540-69778-0_2 Google ScholarGoogle ScholarCross RefCross Ref
  15. Alan Bundy, Frank Van Harmelen, Christian Horn, and Alan Smaill. 1990. The OYSTER-CLAM system. In International Conference on Automated Deduction. Springer, 647-648. https://doi.org/10.1007/3-540-52885-7_123 Google ScholarGoogle ScholarCross RefCross Ref
  16. Ahmet Celik, Karl Palmskog, and Milos Gligoric. 2017. ICoq: Regression proof selection for large-scale verification projects. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 171-182. https://doi.org/10.1109/ASE. 2017.8115630 Google ScholarGoogle ScholarCross RefCross Ref
  17. Adam Chlipala. 2013. Certified Programming with Dependent Types. MIT Press, Boston, MA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1724-1734. https://doi.org/10.3115/v1/ D14-1179 Google ScholarGoogle ScholarCross RefCross Ref
  19. Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 ( 2014 ).Google ScholarGoogle Scholar
  20. Łukasz Czajka and Cezary Kaliszyk. 2018. Hammer for Coq: Automation for Dependent Type Theory. Journal of Automated Reasoning 61, 1-4 ( 2018 ), 423-453. https://doi.org/10.1007/s10817-018-9458-4 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An eficient SMT solver. Tools and Algorithms for the Construction and Analysis of Systems 4963 (April 2008 ), 337-340. https://doi.org/10.1007/978-3-540-78800-3_24 Google ScholarGoogle ScholarCross RefCross Ref
  22. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding, In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). arXiv preprint arXiv: 1810. 04805, 4171-4186. https://doi.org/10.18653/v1/ N19-1423 Google ScholarGoogle ScholarCross RefCross Ref
  23. Michael D. Ernst. 2017. Natural Language is a Programming Language: Applying Natural Language Processing to Software Development. In 2nd Summit on Advances in Programming Languages (SNAPL 2017 ) (Leibniz International Proceedings in Informatics (LIPIcs)), Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi (Eds.), Vol. 71. Schloss DagstuhlLeibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 4 : 1-4 : 14. https://doi.org/10.4230/LIPIcs.SNAPL. 2017.4 Google ScholarGoogle ScholarCross RefCross Ref
  24. Emily First, Yuriy Brun, and Arjun Guha. 2020. Replication package for “TacTok: Semantics-aware proof synthesis”. https://doi.org/10.5281/zenodo.4088897 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Thibault Gauthier, Cezary Kaliszyk, and Josef Urban. 2017. TacticToe: Learning to reason with HOL4 Tactics. In LPAR-21. 21st International Conference on Logic for Programming, Artificial Intelligence and Reasoning, Vol. 46. 125-143.Google ScholarGoogle Scholar
  26. Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. 1999. Learning to forget: Continual prediction with LSTM. Neural Computation 12, 10 ( 1999 ), 2451-2471. https://doi.org/10.1162/089976600300015015 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Klaus Gref, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 28, 10 ( 2017 ), 2222-2232. https: //doi.org/10.1109/TNNLS. 2016.2582924 Google ScholarGoogle ScholarCross RefCross Ref
  28. Ronghui Gu, Zhong Shao, Hao Chen, Xiongnan Wu, Jieung Kim, Vilhelm Sjöberg, and David Costanzo. 2016. CertiKOS: An Extensible Architecture for Building Certified Concurrent OS Kernels. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/guGoogle ScholarGoogle Scholar
  29. Arjun Guha, Mark Reitblatt, and Nate Foster. 2013. Machine Verified Network Controllers. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Seattle, WA, USA. https://doi.org/10.1145/2491956.2462178 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mark Harman. 2007. The Current State and Future of Search Based Software Engineering. In ACM/IEEE International Conference on Software Engineering (ICSE). Minneapolis, MN, USA, 342-357. https://doi.org/10.1109/FOSE. 2007.29 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. John Harrison. 1996. HOL Light: A tutorial introduction. In International Conference on Formal Methods in Computer-Aided Design. Palo Alto, CA, USA, 265-269. https://doi.org/10.1007/BFb0031814 Google ScholarGoogle ScholarCross RefCross Ref
  32. Vincent J. Hellendoorn, Premkumar T. Devanbu, and Mohammad Amin Alipour. 2018. On the naturalness of proofs. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) New Ideas and Emerging Results track. Orlando, FL, USA, 724-728. https://doi.org/10.1145/3236024.3264832 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jónathan Heras and Ekaterina Komendantskaya. 2014. Recycling proof patterns in Coq: Case studies. Mathematics in Computer Science 8, 1 ( 2014 ), 99-116. https://doi.org/10.1007/s11786-014-0173-1 Google ScholarGoogle ScholarCross RefCross Ref
  34. Abram Hindle, Earl T. Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu. 2016. On the Naturalness of Software. Commun. ACM 59, 5 (April 2016 ), 122âĂŞ131. https://doi.org/10.1145/2902362 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Proceedings of the 34th International Conference on Software Engineering (ICSE). 837-847. https://doi.org/10.1109/ICSE. 2012.6227135 Google ScholarGoogle ScholarCross RefCross Ref
  36. Daniel Huang, Prafulla Dhariwal, Dawn Song, and Ilya Sutskever. 2018. GamePad: A Learning Environment for Theorem Proving. CoRR ( 2018 ). https://arxiv.org/abs/ 1806.00608Google ScholarGoogle Scholar
  37. Atalay İleri, Tej Chajed, Adam Chlipala, M. Frans Kaashoek, and Nickolai Zeldovich. 2018. Proving Confidentiality in a File System Using DISKSEC. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. https://www.usenix.org/conference/osdi18/presentation/ileriGoogle ScholarGoogle Scholar
  38. Geofrey Irving, Christian Szegedy, Alexander A Alemi, Niklas Eén, François Chollet, and Josef Urban. 2016. Deepmath-deep sequence models for premise selection. In Advances in Neural Information Processing Systems. Barcelona, Spain, 2235-2243. https://papers.nips.cc/paper/6280-deepmath-deep-sequence-models-for-premise-selectionGoogle ScholarGoogle Scholar
  39. Jonathan Jacky, Stefani Banerian, Michael D. Ernst, Calvin Loncaric, Stuart Pernsteiner, Zachary Tatlock, and Emina Torlak. 2017. Automatic formal verification for EPICS. In International Conference on Accelerator and Large Experimental Physics Control Systems (ICALEPCS). Barcelona, Spain. https://doi.org/10.18429/ JACOW-ICALEPCS2017-TUDPL02 Google ScholarGoogle ScholarCross RefCross Ref
  40. Dongseok Jang, Zachary Tatlock, and Sorin Lerner. 2012. Establishing Browser Security Guarantees Through Formal Shim Verification. In Proceedings of the 21st USENIX Conference on Security Symposium. Bellevue, WA, USA. https: //www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/jangGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  41. Slava Katz. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing 35, 3 ( 1987 ), 400-401. https://doi.org/10.1109/TASSP. 1987.1165125 Google ScholarGoogle ScholarCross RefCross Ref
  42. Yalin Ke, Kathryn T. Stolee, Claire Le Goues, and Yuriy Brun. 2015. Repairing Programs with Semantic Code Search. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) ( 9-13 ). Lincoln, NE, USA, 295-306. https://doi.org/10.1109/ASE. 2015.60 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Reinhard Kneser and Hermann Ney. 1995. Improved backing-of for m-gram language modeling. In International Conference on Acoustics, Speech, and Signal Processing, Vol. 1. Detroit, MI, USA, 181-184. https://doi.org/10.1109/ICASSP. 1995.479394 Google ScholarGoogle ScholarCross RefCross Ref
  44. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL; Interactive Poster and Demonstration Session. 177-180. https: //www.aclweb.org/anthology/P07-2045Google ScholarGoogle ScholarCross RefCross Ref
  45. Ekaterina Komendantskaya, Jónathan Heras, and Gudmund Grov. 2012. Machine learning in proof general: Interfacing interfaces. In Proceedings of 10th International Workshop on User Interfaces for Theorem Provers, Vol. 118. Bremen, Germany. https://doi.org/10.4204/EPTCS.118.2 Google ScholarGoogle ScholarCross RefCross Ref
  46. Laura Kovács and Andrei Voronkov. 2013. First-Order Theorem Proving and Vampire. In International Conference on Computer Aided Verification (CAV), Vol. 8044. Springer-Verlag, Saint Petersburg, Russia, 1-35. https://doi.org/10.1007/978-3-642-39799-8_1 Google ScholarGoogle ScholarCross RefCross Ref
  47. Leonidas Lampropoulos, Zoe Paraskevopoulou, and Benjamin C. Pierce. 2017. Generating Good Generators for Inductive Relations. Proceedings of the ACM on Programming Languages 2, POPL (Dec. 2017 ), 45 : 1-45 : 30. https://doi.org/10.1145/ 3158133 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. K. Rustan M. Leino. 2010. Dafny: An automatic program verifier for functional correctness. In International Conference on Logic for Programming Artificial Intelligence and Reasoning (LPAR). Dakar, Senegal, 348-370. https://doi.org/10.1007/978-3-642-17511-4_20 Google ScholarGoogle ScholarCross RefCross Ref
  49. Xavier Leroy. 2009. Formal verification of a realistic compiler. Commun. ACM 52, 7 ( 2009 ), 107-115. https://doi.org/10.1145/ 1538788.1538814 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Laurent Mauborgne. 2004. AstrÉe: Verification of Absence of Runtime Error. In Building the Information Society. 385-392. https://doi.org/10.1007/978-1-4020-8157-6_30 Google ScholarGoogle ScholarCross RefCross Ref
  51. Christoph C. Michael, Gary McGraw, and Michael A. Schatz. 2001. Generating Software Test Data by Evolution. IEEE Transactions on Software Engineering (TSE) 27, 12 (Dec. 2001 ), 1085-1110. https://doi.org/10.1109/32.988709 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Greg Morrisett, Gang Tan, Joseph Tassarotti, Jean-Baptiste Tristan, and Edward Gan. 2012. RockSalt: Better, Faster, Stronger SFI for the x86. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Beijing, China. https://doi.org/10.1145/2345156.2254111 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Manish Motwani, Mauricio Soto, Yuriy Brun, René Just, and Claire Le Goues. 2020. Quality of Automated Program Repair on Real-World Defects. IEEE Transactions on Software Engineering (TSE) ( 2020 ). https://doi.org/10.1109/TSE. 2020.2998785 DOI: 10.1109/TSE. 2020. 2998785. Google ScholarGoogle ScholarCross RefCross Ref
  54. Tobias Nipkow, Lawrence C Paulson, and Markus Wenzel. 2002. Isabelle/HOL: A proof assistant for higher-order logic. Vol. 2283. Springer Science & Business Media.Google ScholarGoogle Scholar
  55. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), Vol. 1. Association for Computational Linguistics, New Orleans, LA, USA, 2227-2237. https://doi.org/10.18653/v1/ N18-1202 Google ScholarGoogle ScholarCross RefCross Ref
  56. Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the naturalness of buggy code. In Proceedings of the IEEE/ACM 38th International Conference on Software Engineering (ICSE). Austin, TX, USA, 428-439. https://doi.org/10.1145/2884781.2884848 Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Talia Ringer, Nathaniel Yazdani, John Leo, and Dan Grossman. 2018. Adapting proof automation to adapt proofs. In Proceedings of the 7th ACM SIGPLAN International Conference on Certified Programs and Proofs (CPP). Los Angeles, CA, USA, 115-129. https://doi.org/10.1145/3167094 Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. David E Rumelhart, Geofrey E Hinton, and Ronald J Williams. 1986. Learning representations by back-propagating errors. Nature 323, 6088 ( 1986 ), 533. https://doi.org/10.1038/323533a0 Google ScholarGoogle ScholarCross RefCross Ref
  59. Stephan Schulz. 2013. System Description: E 1.8. In Logic for Programming, Artificial Intelligence, and Reasoning, Ken McMillan, Aart Middeldorp, and Andrei Voronkov (Eds.). Springer Berlin Heidelberg, 735-743. https://doi.org/10.1007/978-3-642-45221-5_49 Google ScholarGoogle ScholarCross RefCross Ref
  60. Olaf Seng, Johannes Stammel, and David Burkhart. 2006. Search-based determination of refactorings for improving the class structure of object-oriented systems. In Conference on Genetic and Evolutionary Computation (GECCO). Seattle, WA, USA, 1909-1916. https://doi.org/10.1145/1143997.1144315 Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Ilya Sergey, James R. Wilcox, and Zachary Tatlock. 2017. Programming and Proving with Distributed Protocols. Proceedings of the ACM on Programming Languages 2, POPL (Dec. 2017 ), 28 : 1-28 : 30. https://doi.org/10.1145/3158116 Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Konrad Slind and Michael Norrish. 2008. A brief overview of HOL4. In International Conference on Theorem Proving in Higher Order Logics. 28-32. https://doi.org/10.1007/978-3-540-71067-7_6 Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Edward K. Smith, Earl Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the Cure Worse than the Disease? Overfitting in Automated Program Repair. In Proceedings of the 10th Joint Meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) (2-4). Bergamo, Italy, 532-543. https://doi.org/10.1145/2786805.2786825 Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 1631-1642. https://www.aclweb.org/anthology/D13-1170Google ScholarGoogle Scholar
  65. Andreas Stolcke. 2002. SRILM-An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing. http://www.speech.sri.com/projects/srilm/papers/icslp2002-srilm.pdfGoogle ScholarGoogle Scholar
  66. Martin Sundermeyer, Ralf Schlüter, and Hermann Ney. 2012. LSTM neural networks for language modeling. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH). Portland, OR, USA.Google ScholarGoogle ScholarCross RefCross Ref
  67. Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Annual Meeting of the Association for Computational Linguistics (ACL), Vol. 1. Beijing, China, 1556-1566. https://doi.org/10.3115/v1/ P15-1150 Google ScholarGoogle ScholarCross RefCross Ref
  68. The Coq Development Team. 2017. Coq, v. 8.7. https://coq.inria.fr.Google ScholarGoogle Scholar
  69. Andrzej Trybulec and Howard A Blair. 1985. Computer Assisted Reasoning with MIZAR. In Proceedings of the 9th International Joint Conferences on Artificial Intelligence (IJCAI), Vol. 85. 26-28. https://www.ijcai.org/Proceedings/85-1/Papers/006.pdfGoogle ScholarGoogle Scholar
  70. Zhaopeng Tu, Zhendong Su, and Premkumar Devanbu. 2014. On the localness of software. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE). Hong Kong, China, 269-280.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Niki Vazou. 2016. Liquid Haskell: Haskell as a theorem prover. Ph.D. Dissertation. University of California, San Diego.Google ScholarGoogle Scholar
  72. Martin Vechev and Eran Yahav. 2016. Programming with “Big Code”. Foundations and Trends® in Programming Languages 3, 4 ( 2016 ), 231-284. https://doi.org/10.1561/2500000028 Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Kristen R. Walcott, Mary Lou Sofa, Gregory M. Kapfhammer, and Robert S. Roos. 2006. Time-aware test suite prioritization. In International Symposium on Software Testing and Analysis (ISSTA). Portland, ME, USA, 1-12. https://doi.org/10.1145/ 1146238.1146240 Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Mingzhe Wang, Yihe Tang, Jian Wang, and Jia Deng. 2017. Premise selection for theorem proving by deep graph embedding. In Advances in Neural Information Processing Systems (NeurIPS). Long Beach, CA, USA, 2786-2796. https://papers.nips. cc/paper/6871-premise-selection-for-theorem-proving-by-deep-graph-embeddingGoogle ScholarGoogle Scholar
  75. Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In ACM/IEEE International Conference on Software Engineering (ICSE). Vancouver, BC, Canada, 364-374. https://doi.org/10.1109/ICSE. 2009.5070536 Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. James R. Wilcox, Doug Woos, Pavel Panchekha, Zachary Tatlock, Xi Wang, Michael D. Ernst, and Thomas Anderson. 2015. Verdi: A framework for implementing and formally verifying distributed systems. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Portland, OR, USA, 357-368. https://doi.org/10.1145/2737924. 2737958 Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Kaiyu Yang and Jia Deng. 2019. Learning to prove theorems via interacting with proof assistants. In International Conference on Machine Learning (ICML). Long Beach, CA, USA, 11. http://proceedings.mlr.press/v97/yang19a/yang19a.pdfGoogle ScholarGoogle Scholar
  78. Pengcheng Yin and Graham Neubig. 2017. A Syntactic Neural Model for General-Purpose Code Generation. In Annual Meeting of the Association for Computational Linguistics, Vol. 1. Association for Computational Linguistics, Vancouver, BC, Canada, 440-450. https://doi.org/10.18653/v1/ P17-1041 Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. TacTok: semantics-aware proof synthesis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader