A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Article{SILVA:2018:SEC,
-
author = "Sara Silva and Leonardo Vanneschi and
Ana I. R. Cabral and Maria J. Vasconcelos",
-
title = "A semi-supervised Genetic Programming method for
dealing with noisy labels and hidden overfitting",
-
journal = "Swarm and Evolutionary Computation",
-
volume = "39",
-
pages = "323--338",
-
year = "2018",
-
keywords = "genetic algorithms, genetic programming, Data errors,
Noisy labels, Classification, Hidden overfitting,
Semi-supervised learning",
-
ISSN = "2210-6502",
-
DOI = "doi:10.1016/j.swevo.2017.11.003",
-
URL = "http://www.sciencedirect.com/science/article/pii/S2210650217302730",
-
abstract = "Data gathered in the real world normally contains
noise, either stemming from inaccurate experimental
measurements or introduced by human errors. Our work
deals with classification data where the attribute
values were accurately measured, but the categories may
have been mislabeled by the human in several sample
points, resulting in unreliable training data. Genetic
Programming (GP) compares favorably with the
Classification and Regression Trees (CART) method, but
it is still highly affected by these errors. Despite
consistently achieving high accuracy in both training
and test sets, many classification errors are found in
a later validation phase, revealing a previously hidden
overfitting to the erroneous data. Furthermore, the
evolved models frequently output raw values that are
far from the expected range. To improve the behavior of
the evolved models, we extend the original training set
with additional sample points where the class label is
unknown, and devise a simple way for GP to use this
additional information and learn in a semi-supervised
manner. The results are surprisingly good. In the
presence of the exact same mislabeling errors, the
additional unlabeled data allowed GP to evolve models
that achieved high accuracy also in the validation
phase. This is a brand new approach to semi-supervised
learning that opens an array of possibilities for
making the most of the abundance of unlabeled data
available today, in a simple and inexpensive way",
- }
Genetic Programming entries for
Sara Silva
Leonardo Vanneschi
Ana Isabel Rosa Cabral
Maria Jose Vasconcelos
Citations