abstract = "The problem of evolving binary classification models
under increasingly unbalanced data sets is approached
by proposing a strategy consisting of two components:
Sub-sampling and 'robust' fitness function design. In
particular, recent work in the wider machine learning
literature has recognized that maintaining the original
distribution of exemplars during training is often not
appropriate for designing classifiers that are robust
to degenerate classifier behavior. To this end we
propose a 'Simple Active Learning Heuristic' (SALH) in
which a subset of exemplars is sampled with uniform
probability under a class balance enforcing rule for
fitness evaluation. In addition, an efficient estimator
for the Area Under the Curve (AUC) performance metric
is assumed in the form of a modified
Wilcoxon-Mann-Whitney (WMW) statistic. Performance is
evaluated in terms of six representative UCI data sets
and benchmarked against: canonical GP, SALH based GP,
SALH and the modified WMW statistic, and deterministic
classifiers (Naive Bayes and C4.5). The resulting
SALH-WMW model is demonstrated to be both efficient and
effective at providing solutions maximizing performance
assessed in terms of AUC.",
notes = "John A. Doucette PhD 2016 University of Waterloo (not
GP?)
https://uwspace.uwaterloo.ca/items/1a44cde4-f734-460f-a65e-31871e22212c