abstract = "This thesis describes an exploration of the
application of Genetic Programming to classification
problems in the domain of functional genomics. In the
course of the investigation a GP system was developed
which includes several novel features designed to
specifically target the features of the problem domain.
These features, which are described and investigated,
include: a graph-based program representation scheme; a
packaging system allowing for the construction of
programs in an arbitrary number of layers, designed to
facilitate multiple outputs; a gene pool communal to
the GP population, which, in conjunction with the
layering feature, increases the amount of of effective
code present in the population; a generic ``genetic
engineering'' process for imposing constraints on the
programs of the system; and a backpropagation-inspired
feedback mechanism which increases the speed at which
the system learns.
These features were then tested on data sets exhibiting
the properties of genomic data sets -- in particular,
multiple non-disjoint classes, and hierarchical
classification schemes. These features are then
developed, along with fitness measures and the
evolutionary process to best model data sets of this
type. An exploratory attempt to produce classifications
in conjunction with a graph-based classification scheme
is also performed.
The developed system is then applied to classifying
hard genomic data sets, and the thesis ultimately makes
some predictions concerning previously unclassified
genes.",