Balancing Learning and Overfitting in Genetic Programming with Interleaved Sampling of Training data
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @InProceedings{goncalves:2013:EuroGP,
-
author = "Ivo Goncalves and Sara Silva",
-
title = "Balancing Learning and Overfitting in Genetic
Programming with Interleaved Sampling of Training
data",
-
booktitle = "Proceedings of the 16th European Conference on Genetic
Programming, EuroGP 2013",
-
year = "2013",
-
month = "3-5 " # apr,
-
editor = "Krzysztof Krawiec and Alberto Moraglio and Ting Hu and
A. Sima Uyar and Bin Hu",
-
series = "LNCS",
-
volume = "7831",
-
publisher = "Springer Verlag",
-
address = "Vienna, Austria",
-
pages = "73--84",
-
organisation = "EvoStar",
-
keywords = "genetic algorithms, genetic programming, Overfitting,
Generalisation, Pharmacokinetics, Drug Discovery",
-
isbn13 = "978-3-642-37206-3",
-
DOI = "doi:10.1007/978-3-642-37207-0_7",
-
abstract = "Generalisation is the ability of a model to perform
well on cases not seen during the training phase. In
Genetic Programming generalization has recently been
recognised as an important open issue, and increased
efforts are being made towards evolving models that do
not overfit. In this work we expand on recent
developments that showed that using a small and
frequently changing subset of the training data is
effective in reducing over fitting and improving
generalisation. Particularly, we build upon the idea of
randomly choosing a single training instance at each
generation and balance it with periodically using all
training data. The motivation for this approach is
based on trying to keep overfitting low (represented by
using a single training instance) and still presenting
enough information so that a general pattern can be
found (represented by using all training data). We
propose two approaches called interleaved sampling and
random interleaved sampling that respectively represent
doing this balancing in a deterministic or a
probabilistic way. Experiments are conducted on three
high-dimensional real-life datasets on the
pharmacokinetics domain. Results show that most of the
variants of the proposed approaches are able to
consistently improve generalisation and reduce over
fitting when compared to standard Genetic Programming.
The best variants are even able of such improvements on
a dataset where a recent and representative
state-of-the-art method could not. Furthermore, the
resulting models are short and hence easier to
interpret, an important achievement from the
applications' point of view.",
-
notes = "Part of \cite{Krawiec:2013:GP} EuroGP'2013 held in
conjunction with EvoCOP2013, EvoBIO2013, EvoMusArt2013
and EvoApplications2013",
- }
Genetic Programming entries for
Ivo Goncalves
Sara Silva
Citations