abstract = "Automated program repair techniques aim to aid
software developers with the challenging task of fixing
bugs. In heuristic-based program repair, a search space
of program variants is created by applying mutation
operations on the source code to find potential patches
for bugs. Most commonly, every selection of a mutation
operator during search is performed uniformly at
random. The inefficiency of this critical step in the
search creates many variants that do not compile or
break intended functionality, wasting considerable
resources as a result. we address this issue and
propose a reinforcement learning-based approach to
optimise the selection of mutation operators in
heuristic-based program repair. Our solution is
programming language, granularity-level, and search
strategy agnostic and allows for easy augmentation into
existing heuristic-based repair tools. We conduct
extensive experimentation on four operator selection
techniques, two reward types, two credit assignment
strategies, two integration methods, and three sets of
mutation operators using 22300 independent repair
attempts. We evaluate our approach on 353 real-world
bugs from the Defects4J benchmark. Results show that
the epsilon-greedy multi-armed bandit algorithm with
average credit assignment is best for mutation operator
selection. Our approach exhibits a 17.3percent
improvement upon the baseline, by generating patches
for 9 additional bugs for a total of 61 patched bugs in
the Defects4J benchmark.",