Coevolving functions in genetic programming

https://doi.org/10.1016/S1383-7621(01)00016-9Get rights and content

Abstract

In this paper we introduce a new approach to the use of automatically defined functions (ADFs) within genetic programming. The technique consists of evolving a number of separate sub-populations of functions which can be used by a population of evolving main programs. We present and refine a set of mechanisms by which the number and constitution of the function sub-populations can be defined and compare their performance on two well-known classification tasks. A final version of the general approach, for use explicitly on classification tasks, is then presented. It is shown that in all cases the coevolutionary approach performs better than traditional genetic programming with and without ADFs.

Introduction

The use of function sub-routines is ubiquitous in high-level computer programming languages. Functions provide the ability to reuse code efficiently, generating a modular and hierarchical structure to programs. Since its formalization, genetic programming (GP) [8] has included the ability to exploit functional sub-routines, termed “automatically defined functions” (ADFs) [8, p. 534] modular functions can form part of the genetic make-up of an evolving program.

In this paper we present a coevolutionary approach to the use of ADFs in genetic programming. Here each identified ADF is assigned its own independent sub-population which coevolves with other ADF sub-populations and a population of main program trees or “result-producing branches” (RPBs). For each evaluation, ADFs from each sub-population are selected randomly to be used by a program, where fitness can be assigned globally or locally. In the global case all ADFs and the main tree receive the same fitness. In the local case the main tree receives the global fitness, but each ADF receives the fitness available for that aspect of the task with which it is concerned. In either case selection and reproduction are done independently within each sub-population. In this paper we show that the coevolutionary approach performs better than traditional GP and traditional GP with ADFs on two well-known classification tasks.

We then introduce a version of the approach termed evolution defined functions (EDFs) which uses the coevolutionary strategy in conjunction with the two mutation operators, compression and expansion, of the genetic library builder (GLiB) [2]. We use the same two classification tasks to show that the automatic specification of EDF sub-populations via compression is beneficial when the existence of a particular function is determined by a measure of its worth/recent usage. We then extend the approach further to allow any number of functions to be created during evolution, rather than having an a priori fixed number of EDFs, again using the measure of existing function worth. It is again shown that improvements can be achieved over the previous approaches.

Finally, we introduce a version of our coevolutionary technique specifically for classification tasks, based on the work of [10], in which ADFs are feature preprocessors/extractors for a classification algorithm.

The paper is arranged as follows: the following section details the basic ADF strategy of GP and our new approach. Section 3 describes the problems used to do the comparisons and Section 4 presents results from using our coevolutionary approach. Section 5 introduces the EDF approach with dynamic function creation and Section 6 presents the results of its use. Section 7 shows the results obtained from allowing the number of EDFs available to evolve over time and Section 8 presents a version of the coevolutionary approach designed specifically for classification tasks. Finally, all findings are discussed.

Section snippets

Automatically defined functions

Koza [8] presented ADFs as a refinement to genetic programming with the aim of enabling the composite evolution of larger programs. Here each identified ADF is genetically joined to the main program tree such that a child's ADFs are a mix of its parents'; each joined ADF recombines with the corresponding ADF of the other parent. Whenever a call to a particular type of ADF is made, the joined example individual is used. The number of ADFs available during evolution is fixed a priori and the ADFs

The classification tasks

In this paper we use two well-known classification problems to compare the different strategies (see [4] for an example of the comparative performance and potential benefits of GP for classification tasks).

Australian credit card

For this problem we use populations of size 1410 individuals for the traditional approaches, therefore in the COADF approach each population contains 470 individuals as there are two function sub-populations and a population of main programs.

It has been found that the best results are obtained when crossover (for each population) is performed on a small percentage (20%) of the populations. A mutation rate of 0.02 and roulette-wheel selection are used.

The results shown for this task (always) the

Dynamic function creation

Angeline and Pollack [2] have demonstrated the use of a GLiB to alter the structure of the genotype. The GLiB uses two novel mutation operators (“compression” and “expansion”) to compress and expand modules (sub-trees) during evolution. A randomly selected sub-tree from the genotype is compressed as a module and added to the GLiB. The library keeps the definition of a module and the usage of the module by the subsequent generations which indicates its worth. The idea here is that compression

Australian credit card

We use the same parameters as in Section 4.1, with the new sub-trees taken from the main program mutated with probability 0.5 per node. The usage counters are checked (compression/expansion) at the end of every five generations here.

We also compare the performance of the standard GLiB approach to the use of EDFs in this section. We implement the GLiB mechanism such that compression/expansion events occur at the same frequency as in the EDF system, with the same conditions for

Fully dynamic coevolutionary functions

In the previous section it was found that, for the two classification tasks, too many functions necessary to solve the tasks with EDFs were defined a priori; two functions existed where only one proved beneficial to the search process in the Australian credit tasks and three were defined where only two proved significantly useful in the letter recognition task. For more complex tasks specifying the correct number of functions is, potentially, both more difficult and critical. To cope with this

Coevolving functions specifically for classification tasks

So far we have used two classification tasks to demonstrate our new approach to the use of functions in genetic programming; hierarchical, modular classification programs have been coevolved. This technique can of course be used in genetic programming for many types of task. In this section we introduce a final version of our coevolutionary approach, based on the work of Raymer et al. [10], specifically for classification tasks.

Raymer et al. [10] have presented an alternative approach to the

Conclusions

In this paper we have presented a coevolutionary approach to the use of ADFs in genetic programming. Here each identified ADF is assigned its own independent sub-population which coevolves with other ADF sub-populations and a population of main program trees or “result-producing branches” (RPBs). For each evaluation, random ADFs from each sub-population are selected to be used by a program (see [1] for a comparison of different ADF selection strategies). This is in contrast to Koza's original

Manu Ahluwalia received the Ph.D. degree in Computer Science from the University of the West of England, U.K. in 2000. The subject being the use of Genetic Programming with functional decomposition for data mining. He is currently a Research Scientist for Applied Predictive Technologies, Inc.

References (16)

  • M. Ahluwalia, L. Bull, T.C. Fogarty, Coevolving functions in genetic programming: a comparison in ADF selection...
  • P.J. Angeline et al.

    Coevolving high-level representations

  • L. Bull et al.

    Evolutionary computing in multi-agent environments: speciation and symbiogenesis

  • A.E. Eiben, T.J. Euverman, W. Kowalczyk, F. Slisser, Modelling customer retention with statistical techniques, rough...
  • P.W. Frey et al.

    Letter recognition using Holland-style adaptive classifier systems

    Machine Learning

    (1991)
  • P. Husbands, F. Mill, Simulated coevolution as the mechanism for emergent planning and scheduling, in: R.L. Belew, L.B....
There are more references available in the full text version of this article.

Cited by (13)

View all citing articles on Scopus

Manu Ahluwalia received the Ph.D. degree in Computer Science from the University of the West of England, U.K. in 2000. The subject being the use of Genetic Programming with functional decomposition for data mining. He is currently a Research Scientist for Applied Predictive Technologies, Inc.

Larry Bull received the B.Sc. and Ph.D. degrees in Computer Science from the University of the West of England, U.K., in 1992 and 1995 respectively. He is currently a Senior Research Fellow at the University and award leader for an M.Sc. Machine Learning & Adaptive Computing.

View full text