abstract = "Generating a random sampling of program trees with
specified function and terminal sets is the initial
step of many program evolution systems. I present a
theoretical and experimental analysis of the expected
distribution of uniformly sampled programs, guided by
algorithmic information theory. This analysis
demonstrates that increasing the sample size is often
an inefficient means of increasing the overall
diversity of program behaviours (outputs). A novel
sampling scheme (semantic sampling) is proposed that
exploits semantics to heuristically increase behavioral
diversity. An important property of the scheme is that
no calls of the problem-specific fitness function are
required. Its effectiveness at increasing behavioural
diversity is demonstrated empirically for Boolean
formulae. Furthermore, it is found to lead to
statistically significant improvements in performance
for genetic programming on parity and multiplexer
problems.",
notes = "GECCO-2007 A joint meeting of the sixteenth
international conference on genetic algorithms
(ICGA-2007) and the twelfth annual genetic programming
conference (GP-2007).