On the Importance of Data Balancing for Symbolic Regression
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Article{Vladislavleva:2010:ieeeTEC,
-
author = "Ekaterina Vladislavleva and Guido Smits and
Dick {den Hertog}",
-
title = "On the Importance of Data Balancing for Symbolic
Regression",
-
journal = "IEEE Transactions on Evolutionary Computation",
-
year = "2010",
-
volume = "14",
-
number = "2",
-
pages = "252--277",
-
month = apr,
-
keywords = "genetic algorithms, genetic programming, Compression,
data balancing, data scoring, data weighting, fitting,
information content, modeling, subset selection,
symbolic regression",
-
ISSN = "1089-778X",
-
DOI = "doi:10.1109/TEVC.2009.2029697",
-
size = "26 pages",
-
abstract = "Symbolic regression of input-output data
conventionally treats data records equally. We suggest
a framework for automatic assignment of weights to data
samples, which takes into account the sample's relative
importance. In this paper, we study the possibilities
of improving symbolic regression on real-life data by
incorporating weights into the fitness function. We
introduce four weighting schemes defining the
importance of a point relative to proximity,
surrounding, remoteness, and nonlinear deviation from k
nearest-in-the-input-space neighbors. For enhanced
analysis and modeling of large imbalanced data sets we
introduce a simple multidimensional iterative technique
for subsampling. This technique allows a sensible
partitioning (and compression) of data to nested
subsets of an arbitrary size in such a way that the
subsets are balanced with respect to either of the
presented weighting schemes. For cases where a given
input output data set contains some redundancy, we
suggest an approach to considerably improve the
effectiveness of regression by applying more modeling
effort to a smaller subset of the data set that has a
similar information content. Such improvement is
achieved due to better exploration of the search space
of potential solutions at the same number of function
evaluations. We compare different approaches to
regression on five benchmark problems with a fixed
budget allocation. We demonstrate that the significant
improvement in the quality of the regression models can
be obtained either with the weighted regression,
exploratory regression using a compressed subset with a
similar information content, or exploratory weighted
regression on the compressed subset, which is weighted
with one of the proposed weighting schemes.",
-
notes = "also known as \cite{5325864}",
- }
Genetic Programming entries for
Ekaterina (Katya) Vladislavleva
Guido F Smits
Dick den Hertog
Citations