Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Article{DEMELO:2018:IS,
-
author = "Vinicius Veloso {de Melo} and Wolfgang Banzhaf",
-
title = "Automatic feature engineering for regression models
with machine learning: An evolutionary computation and
statistics hybrid",
-
journal = "Information Sciences",
-
volume = "430-431",
-
pages = "287--313",
-
year = "2018",
-
keywords = "genetic algorithms, genetic programming, Feature
engineering, Machine learning, Symbolic regression,
Kaizen programming, Linear regression, Hybrid",
-
ISSN = "0020-0255",
-
DOI = "doi:10.1016/j.ins.2017.11.041",
-
URL = "http://www.sciencedirect.com/science/article/pii/S0020025517311040",
-
abstract = "Symbolic Regression (SR) is a well-studied task in
Evolutionary Computation (EC), where adequate free-form
mathematical models must be automatically discovered
from observed data. Statisticians, engineers, and
general data scientists still prefer traditional
regression methods over EC methods because of the solid
mathematical foundations, the interpretability of the
models, and the lack of randomness, even though such
deterministic methods tend to provide lower quality
prediction than stochastic EC methods. On the other
hand, while EC solutions can be big and
uninterpretable, they can be created with less bias,
finding high-quality solutions that would be avoided by
human researchers. Another interesting possibility is
using EC methods to perform automatic feature
engineering for a deterministic regression method
instead of evolving a single model; this may lead to
smaller solutions that can be easy to understand. In
this contribution, we evaluate an approach called
Kaizen Programming (KP) to develop a hybrid method
employing EC and Statistics. While the EC method builds
the features, the statistical method efficiently builds
the models, which are also used to provide the
importance of the features; thus, features are improved
over the iterations resulting in better models. Here we
examine a large set of benchmark SR problems known from
the EC literature. Our experiments show that KP
outperforms traditional Genetic Programming - a popular
EC method for SR - and also shows improvements over
other methods, including other hybrids and well-known
statistical and Machine Learning (ML) ones. More in
line with ML than EC approaches, KP is able to provide
high-quality solutions while requiring only a small
number of function evaluations",
- }
Genetic Programming entries for
Vinicius Veloso de Melo
Wolfgang Banzhaf
Citations