abstract = "Automated program repair techniques aim to aid
software developers with the challenging task of fixing
bugs. In heuristic-based program repair, a search space
of mutated program variants is explored to find
potential patches for bugs. Most commonly, every
selection of a mutation operator during search is
performed uniformly at random, which can generate many
buggy, even uncompilable programs. Our goal is to
reduce the generation of variants that do not compile
or break intended functionality which waste
considerable resources. We investigate the feasibility
of a reinforcement learning-based approach for the
selection of mutation operators in heuristic-based
program repair. Our proposed approach is programming
language, granularity-level, and search strategy
agnostic and allows for easy augmentation into existing
heuristic-based repair tools. We conducted an extensive
empirical evaluation of four operator selection
techniques, two reward types, two credit assignment
strategies, two integration methods, and three sets of
mutation operators using 30080 independent repair
attempts. We evaluated our approach on 353 real-world
bugs from the Defects4J benchmark. The reinforcement
learning-based mutation operator selection results in a
higher number of test-passing variants, but does not
exhibit a noticeable improvement in the number of bugs
patched in comparison with the baseline, uniform random
selection. While reinforcement learning has been
previously shown to be successful in improving the
search of evolutionary algorithms, often used in
heuristic-based program repair, it has yet to
demonstrate such improvements when applied to this area
of research.",