Diversity-Driven Automated Formal Verification
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @InProceedings{First:2022:ICSE,
-
author = "Emily First and Yuriy Brun",
-
title = "Diversity-Driven Automated Formal Verification",
-
booktitle = "Proceedings of the 44th International Conference on
Software Engineering (ICSE)",
-
year = "2022",
-
address = "Pittsburgh, PA, USA",
-
month = may # " 21-29",
-
publisher = "ACM",
-
note = "ACM SIGSOFT Distinguished Paper Award",
-
keywords = "genetic algorithms, genetic programming, Automated
formal verification, language models, Coq, interactive
proof assistants, proof synthesis, ANN, LSTM, AST",
-
isbn13 = "978-1-4503-9221-1/22/05",
-
URL = "https://people.cs.umass.edu/~brun/pubs/pubs/First22icse.pdf",
-
DOI = "doi:10.1145/3510003.3510138",
-
size = "13 pages",
-
abstract = "Formally verified correctness is one of the most
desirable properties of software systems. But despite
great progress made via interactive theorem provers,
such as Coq, writing proof scripts for verification
remains one of the most effort-intensive (and often
prohibitively difficult) software development
activities. Recent work has created tools that
automatically synthesize proofs or proof scripts. For
example, CoqHammer can prove 26.6percent of theorems
completely automatically by reasoning using precomputed
facts, while TacTok and ASTactic, which use machine
learning to model proof scripts and then perform biased
search through the proof-script space, can prove
12.9percent and 12.3percent of the theorems,
respectively. Further, these three tools are highly
complementary; together, they can prove 30.4percent of
the theorems fully automatically. Our key insight is
that control over the learning process can produce a
diverse set of models, and that, due to the unique
nature of proof synthesis (the existence of the theorem
prover, an oracle that infallibly judges a proof's
correctness), this diversity can significantly improve
these tools' proving power. Accordingly, we develop
Diva, which uses a diverse set of models with TacTok's
and ASTactic's search mechanism to prove 21.7percent of
the theorems. That is, Diva proves 68percent more
theorems than TacTok and 77percent more than ASTactic.
Complementary to CoqHammer, Diva proves 781 theorems
(27percent added value) that Coq-Hammer does not, and
364 theorems no existing tool has proved automatically.
Together with CoqHammer, Diva proves 33.8percent of the
theorems, the largest fraction to date. We explore nine
dimensions for learning diverse models, and identify
which dimensions lead to the most useful diversity.
Further, we develop an optimization to speed up Diva's
execution by 40X. Our study introduces a completely new
idea for using diversity in machine learning to improve
the power of state-of-the-art proof-script synthesis
techniques, and empirically demonstrates that the
improvement is significant on a dataset of 68K theorems
from 122 open-source software projects.",
- }
Genetic Programming entries for
Emily First
Yuriy Brun
Citations