Quality of Automated Program Repair on Real-World Defects
Created by W.Langdon from
gp-bibliography.bib Revision:1.7964
- @Article{Motwani:2022:TSE,
-
author = "Manish Motwani and Mauricio Soto and Yuriy Brun and
Rene Just and Claire {Le Goues}",
-
title = "Quality of Automated Program Repair on Real-World
Defects",
-
journal = "IEEE Transactions on Software Engineering",
-
year = "2022",
-
volume = "48",
-
number = "2",
-
pages = "637--661",
-
keywords = "genetic algorithms, genetic programming, genetic
improvement, APR, GenProg",
-
ISSN = "0098-5589",
-
URL = "https://doi.org/10.1109/TSE.2020.2998785",
-
DOI = "doi:10.1109/TSE.2020.2998785",
-
abstract = "Automated program repair is a promising approach to
reducing the costs of manual debugging and increasing
software quality. However, recent studies have shown
that automated program repair techniques can be prone
to producing patches of low quality, overfitting to the
set of tests provided to the repair technique, and
failing to generalise to the intended specification. we
rigorously explores this phenomenon on real-world Java
programs, analyzing the effectiveness of four
well-known repair techniques, GenProg, Par, SimFix, and
TrpAutoRepair, on defects made by the projects
developers during their regular development process. We
find that: (1) When applied to real-world Java code,
automated program repair techniques produce patches for
between 10.6percent and 19.0percent of the defects,
which is less frequent than when applied to C code. (2)
The produced patches often over-fit to the provided
test suite, with only between 13.8percent and
46.1percent of the patches passing an independent set
of tests. (3) Test suite size has an extremely small
but significant effect on the quality of the patches,
with larger test suites producing higher-quality
patches, though, surprisingly, higher-coverage test
suites correlate with lower-quality patches. (4) The
number of tests that a buggy program fails has a small
but statistically significant positive effect on the
quality of the produced patches. (5) Test suite
provenance, whether the test suite is written by a
human or automatically generated, has a significant
effect on the quality of the patches, with
developer-written tests typically producing
higher-quality patches. And (6) the patches exhibit
insufficient diversity to improve quality through some
method of combining multiple patches. We develop
JaRFly, an open-source framework for implementing
techniques for automatic search-based improvement of
Java programs. Our study uses JaRFly to faithfully
reimplement GenProg and TrpAutoRepair to work on Java
code, and makes the first public release of an
implementation of Par. Unlike prior work, our study
carefully controls for confounding factors and produces
a methodology, as well as a dataset of
automatically-generated test suites, for objectively
evaluating the quality of Java repair techniques on
real-world defects.",
-
notes = "also known as \cite{9104918}
Manning College of Information and Computer Sciences,
University of Massachusetts Amherst, Amherst, MA, USA",
- }
Genetic Programming entries for
Manish Motwani
Mauricio Soto
Yuriy Brun
Rene Just
Claire Le Goues
Citations