abstract = "Susceptibility to Alzheimer's disease is likely due to
complex interaction among many genetic and
environmental factors. Identifying complex genetic
effects in large data sets will require computational
methods that extend beyond what parametric statistical
methods such as logistic regression can provide. We
have previously introduced a computational evolution
system (CES) that uses genetic programming (GP) to
represent genetic models of disease and to search for
optimal models in a rugged fitness landscape that is
effectively infinite in size. The CES approach differs
from other GP approaches in that it is able to learn
how to solve the problem by generating its own
operators. A key feature is the ability for the
operators to use expert knowledge to guide the
stochastic search. We have previously shown that CES is
able to discover nonlinear genetic models of disease
susceptibility in both simulated and real data. The
goal of the present study was to introduce a measure of
interestingness into the modelling process. Here, we
define interestingness as a measure of non-additive
gene-gene interactions. That is, we are more interested
in those CES models that include attributes that
exhibit synergistic effects on disease risk. To
implement this new feature we first pre-processed the
data to measure all pairwise gene-gene interaction
effects using entropy-based methods. We then provided
these pre-computed measures to CES as expert knowledge
and as one of three fitness criteria in
three-dimensional Pareto optimisation. We applied this
new CES algorithm to an Alzheimer's disease data set
with approximately 520,000 genetic attributes. We show
that this approach discovers more interesting models
with the added benefit of improving classification
accuracy. This study demonstrates the applicability of
CES to genome-wide genetic analysis using expert
knowledge derived from measures of interestingness.",