Algorithm Induction, Modularity and Complexity

author = "John R. Woodward",

title = "Algorithm Induction, Modularity and Complexity",

school = "School of Computer Science, The University of Birmingham",

year = "2004",

address = "UK",

month = sep,

email = "jrw@cs.bham.ac.uk",

keywords = "genetic algorithms, genetic programming",

URL = "

http://www.cs.stir.ac.uk/~jrw/publications/thesis.pdf",

size = "149 pages",

abstract = "We are concerned with the induction of a rule from a set of observations, the goal being to succinctly describe the observed data, but more importantly to place us in a position to make predictions about future data which we have not previously seen. One approach to rule induction is to form a hypothesis space which consists of potential rules or hypotheses, and then search this space for a rule which is consistent with the observed data. One formulation of this approach is Genetic Programming, where the hypothesis spaces consists of computer programs and the search of this space is conducted using biologically inspired search operators and a fitness function.

The well known No Free Lunch theorems are central to search and essentially says all search algorithms perform equally, under a number of assumptions. We examine these assumptions and show that they are invalidated when the hypothesis space contains hypotheses which represent functions with different frequencies, as is the case with Genetic Programming and a number of other Machine Learning paradigms. The Conservation of Generalisation, which is related to the No Free Lunch theorems, implies that generalisation is impossible. This is contrary to Occam's razor. The Conservation of Generalisation theorems and Occam razor are consistent if we restrict ourselves to representations which do not compress the observed data.

We define the representational complexity of a function to be the minimum size of a given representation which can express the function. Given a primitive set, and the operation of composition, new functions can be constructed, which is the representation standard Genetic Programming uses to express functions. However the complexity of a function under this type of representation will depend on the primitive set. We prove that if a representation is capable of expressing modules (i.e. reuse of component parts of the representation), then the complexity of a function is independent of the primitive set (up to a constant which depends on the primitive set but not on the function being represented).

We then conduct a number of experiments related to the evolution of Turing Complete representations. We argue that, if a representation can address the general case of a variable sized problem then the average number of evaluations to find a solution is independent of the problem size. In Genetic Programming, a fitness function is used to drive the evolutionary process and promote programs which are more likely to lead to solutions. We experiment with a fitness function which includes information about whether or not a program had to be aborted during it's evaluation and demonstrate that a 10 fold reduction in the number of evaluations can be achieved on an arithmetic problem. Finally, we experiment with a crossover operator which is inspired by the theory of recursive functions and reduced the probability of producing a program which has to be terminated during its evaluation.",