Measuring Structural Complexity of GP Models for Feature Engineering over the Generations
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @InProceedings{batista:2024:CEC2,
-
author = "Joao Eduardo Batista and Adam Kotaro Pindur and
Hitoshi Iba and Sara Silva",
-
title = "Measuring Structural Complexity of {GP} Models for
Feature Engineering over the Generations",
-
booktitle = "2024 IEEE Congress on Evolutionary Computation (CEC)",
-
year = "2024",
-
editor = "Bing Xue",
-
address = "Yokohama, Japan",
-
month = "30 " # jun # " - 5 " # jul,
-
publisher = "IEEE",
-
keywords = "genetic algorithms, genetic programming, Measurement,
Analytical models, Computational modeling, Pipelines,
Predictive models, Prediction algorithms, Complexity
theory, Model Complexity, Feature Engineering, Model
Interpretability, Classification",
-
isbn13 = "979-8-3503-0837-2",
-
DOI = "doi:10.1109/CEC60901.2024.10611989",
-
abstract = "Feature engineering is a necessary step in the machine
learning pipeline. Together with other preprocessing
methods, it allows the conversion of raw data into a
dataset containing only the necessary features to solve
the task at hand, reducing the computational complexity
of inducing models and creating models that are
potentially simpler, more robust, and more
interpretable. We use M3GP, a wrapper-based feature
engineering algorithm, to induce a set of features that
are adapted in number and in shape to several
classifiers with different levels of predictive power,
from decision trees with depth 3 to random forests with
100 estimators and no depth limit. Intuition tells us
that classifiers that are restricted in the number of
features should compensate for this restriction by
using features with a high degree of correlation with
the target objective. By opposition, the principle
behind the boosting algorithm tells us that we can
create a strong classifier using a large set of weak
features. This indicates that classifiers with no
restrictions should prefer many but weaker features.
Our results confirm this hypothesis while also
revealing that M3GP induces unnecessarily complex
features. We measure complexity using several
structural complexity metrics found in the literature
and show that, although our pipeline consistently
obtains good results, the structural complexity of the
induced models varies drastically across runs.
Additionally, while the test performance peaks in the
early stages of the evolution, the complexity of the
feature engineering models continues to grow, with
little to no return in test performance. This work
promotes using several complexity metrics to measure
model interpretability and identifies issues related to
model complexity in M3GP, proposing solutions to
improve the computational cost of inducing models and
the complexity of the final models.",
-
notes = "also known as \cite{10611989}
WCCI 2024",
- }
Genetic Programming entries for
Joao Eduardo Batista
Adam Kotaro Pindur
Hitoshi Iba
Sara Silva
Citations