Created by W.Langdon from gp-bibliography.bib Revision:1.5356
This thesis studies grammar-based approaches in the application of Estimation ofDistribution Algorithms (EDA) to the tree representation widely used in GeneticProgramming (GP).
Evolutionary Computation (EC), motivated by evolution in the real world, has be-come one of the most widely used Machine Learning techniques, because of itseffectiveness and versatility. Partially due to the disruptiveness of the genetic op-erators, there has been a growing interest in Estimation of Distribution Algorithms(EDA). The basic idea of EDA is to capture the interactions among genes, whichrepresent the internal structure of problem solutions, and in this way to estimatethe distribution of good solutions directly, rather than employing genetic operators.More specifically, in each generation, a model, typically a probabilistic model, isinductively learnt from good individuals (training examples), and is sampled to cre-ate new individuals of the next generation. A population is not usually maintainedbetween generations, and genetic operators are omitted from EDAs.
Although EDA is becoming one of the most active fields in EC, the solution rep-resentation in most EDA is a Genetic Algorithms (GA) style linear representation(one dimensional array, known as the chromosome in GA literature). The morecomplex tree representations, resembling Genetic Programming (GP), have receivedonly limited exploration, perhaps as a result of their intrinsic complexity. This isunfortunate, because tree representations provide a natural and expressive way ofrepresenting solutions for many problems. This is a significant gap in EDA and GPresearch, because EDA not only has the potential to improve GP performance, butalso provides a new perspective for understanding GP. This thesis aims to help fillthis gap, exploring grammar-based approaches to extending Estimation of Distribu-tion Algorithms to GP-style tree representations.
Extending conventional EDA to tree representations is not trivial, because of thecomplexity of tree structures. Tree representations provide an alternative way of rep-resenting solutions, which is important to a large class of problems, but at the sametime introduce important complications, including larger search spaces and more complex interactions among tree nodes. This thesis proposes a framework, calledProgram Distribution Estimation with Grammar Models (PRODIGY), which signif-icantly extends previous applications of EDA to GP-style tree representations. Thecore of this framework is a Stochastic Context-free Grammar model (SCFG). In this research, we confirm both theoretically and empirically that PRODIGY is able tolearn a wide range of the dependencies among nodes in tree form individuals that arebelieved to be important in current GP literature. Two different approaches underthe PRODIGY framework are proposed and implemented. The first one iterativelyrefines the grammar model (but requires additional constraints on the search), whilethe second one builds a full grammar model, including both its probability and struc-ture components, directly from the training samples. These approaches performedwell on the problems studied, exhibiting search characteristics complementary tothose of GP. Specifically, the contributions of the thesis are as follows.
1. Providing a comprehensive survey of current research on EDA with emphasison EDA with GP-style tree representation. We attempt to clarify the relation-ship between EDA with conventional linear representations and those with aGP-style tree representation, and to reveal the unique difficulties which facethis research.
2. Identifying desirable properties of probabilistic models for EDA with GP-styletree representation, and deriving the PRODIGY framework as a consequence.
3. Proposing Program Evolution with Explicit Learning (PEEL) as one imple-mentation of PRODIGY. PEEL’s incremental general-to-specific grammar learn-ing method balances the effectiveness and efficiency of the grammar learning.
4. Proposing Grammar Model-based Program Evolution (GMPE) as another im-plementation of PRODIGY. GMPE realises the PRODIGY framework by in-troducing elegant inference methods from the formal grammar field. It pro-vides good performance on some problems, but also provides a means to betterunderstand some aspects of conventional GP, especially the building block hy-pothesis.
5. Proposing Swift GMPE (sGMPE), which is an extension of GMPE, aiming atreducing the computational cost.
6. Deriving a more accurate Minimum Message Length metric for grammar learn-ing in PRODIGY. This metric leads to improved performance in the GMPEsystem, but may also be useful in grammar learning in general. It is alsorelevant to the learning of other probabilistic graphical models.",
Genetic Programming entries for Yin Shan