Data Aggregation for Reducing Training Data in Symbolic Regression
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @InProceedings{Kammerer:2019:EUROCAST,
-
author = "Lukas Kammerer and Gabriel Kronberger and
Michael Kommenda",
-
title = "Data Aggregation for Reducing Training Data in
Symbolic Regression",
-
booktitle = "International Conference on Computer Aided Systems
Theory, EUROCAST 2019",
-
year = "2019",
-
editor = "Roberto Moreno-Diaz and Franz Pichler and
Alexis Quesada-Arencibia",
-
volume = "12013",
-
series = "Lecture Notes in Computer Science",
-
pages = "378--386",
-
address = "Las Palmas de Gran Canaria, Spain",
-
month = "17-22 " # feb,
-
publisher = "Springer",
-
keywords = "genetic algorithms, genetic programming, Symbolic
regression, Machine learning, Sampling",
-
isbn13 = "978-3-030-45092-2",
-
DOI = "doi:10.1007/978-3-030-45093-9_46",
-
abstract = "The growing volume of data makes the use of
computationally intense machine learning techniques
such as symbolic regression with genetic programming
more and more impractical. This work discusses methods
to reduce the training data and thereby also the
runtime of genetic programming. The data is aggregated
in a preprocessing step before running the actual
machine learning algorithm. K-means clustering and data
binning is used for data aggregation and compared with
random sampling as the simplest data reduction method.
We analyze the achieved speed-up in training and the
effects on the trained models' test accuracy for every
method on four real-world data sets. The performance of
genetic programming is compared with random forests and
linear regression. It is shown, that k-means and random
sampling lead to very small loss in test accuracy when
the data is reduced down to only 30percent of the
original data, while the speed-up is proportional to
the size of the data set. Binning on the contrary,
leads to models with very high test error.",
- }
Genetic Programming entries for
Lukas Kammerer
Gabriel Kronberger
Michael Kommenda
Citations