abstract = "Software synthesis research has historically relied on
tools such as step limits to handle undesired behaviour
like infinite loops. Here we explore the impact of
different step limits on several benchmark problems,
and see that these limits do affect the evolved
behaviors both in terms of generalization and
stability. To assess stability, we ran evolved programs
with a range of step limits, and found several cases
where programs failed to generalize with the step limit
used during evolution, but generalized at other step
limits. Two of our test problems evolved stable
solutions in the sense that they correctly handled
unseen test cases for all step limits above a certain
point, i.e., correctly computed the answer. Our other
two test problems, however, sometimes evolved unstable
solutions which only generalised (i.e., correctly
handled unseen test cases) for specific step limits.
These programs relied on the step limit to terminate,
and would no longer generalise if the step limit was
modified slightly. This indicates that step limits can
have a substantial impact on evolutionary performance,
and suggests we need to revisit our notions of
generalization in the context of evolutionary software
synthesis.",