Competent Algorithms for Geometric Semantic Genetic Programming

Tomasz P. Pawlak

Abstract

Genetic Programming (GP) is a machine learning technique for automatic induction of computer programs from examples. The examples typically consist of two parts: program arguments – the input and a target program output. Both input and output may be expressed in terms of numeric or textual variables or even a conglomerate of the above. This problem formulation enables formalizing
semantics of a program as a tuple of outputs it returns in effect of execution on the sample inputs. Use of semantics in GP gains an interest in research community, since the semantic methods developed so far in GP proved capable of achieving lower error, better generalization and smaller programs and quicker convergence to an optimum than the contemporary methods.
We embrace existing notions of semantics of program, semantic distance, semantic neutrality and effectiveness of genetic operators under the umbrella of common conceptual framework for Semantic Genetic Programming (SGP).
Then, we show that if a fitness function is a metric, the fitness landscape spanned over a space of all programs proxied by semantics becomes a cone with the optimal semantics in the apex. This provides justification for use of the recently developed Geometric Semantic Genetic Programming (GSGP), where geometric genetic operators utilize the conic shape of the landscape. We derive
properties of progress and progress bounds of geometric operators for different combinations of fitness functions and semantic distances.
We present a comprehensive literature review of existing semantic methods, discuss their advantages and disadvantages and for each show how the defined properties apply to it.
Next, we propose an algorithm for backpropagating semantics trough program structure and competent algorithms for operators: population initialization, parent selection, mutation and crossover that are approximately geometric, effective and free of certain drawbacks of the existing geometric methods.
Then, we experimentally assess the proposed algorithms and compare them with the existing methods in terms of training set fitness, generalization on test set, probability of performing geometric and effective application, size of produced programs and computational costs. We use a suite of nine symbolic regression and nine Boolean program synthesis benchmarks. The analysis shows that the proposed algorithms achieve performance that is consistently better than that offered by other semantic GP methods for symbolic regression domain and not worse than the best other methods for Boolean domain.
Finally, we experimentally find the proportions of competent mutation and competent crossover that lead to the optimal results in the above range of benchmarks.

Download full text.

BibTeX

@phdthesis{PawlakPhD2015,
title = {Competent Algorithms for Geometric Semantic Genetic Programming},
author = {Tomasz P. Pawlak},
school = {Poznan University of Technology},
address = {Pozna'{n}, Poland},
year = 2015
}

Suplementary material

The Java source code of Competent Algorithms for Geometric Semantic Genetic Programming and other parts of the experimental suite presented in the thesis, together with the compiled Jar are available under conditions of Academic Free License. Java 1.7+ is required to compile the Jar from source. Instructions how to create configuration files and run the experiment are available in ECJ manual. The classes under interest of this study can be found in the following namespaces:

  • ec.app.semanticGP.* (problem definitions)
  • ec.gp.semantic.* (operators and basic semantic stuff)
  • library.* (semantic library helper classes)