Genetic Programming for Feature Selection Based on Feature Removal Impact in High-Dimensional Symbolic Regression
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Article{Al-Helali:ETCI,
-
author = "Baligh Al-Helali and Qi Chen and Bing Xue and
Mengjie Zhang",
-
journal = "IEEE Transactions on Emerging Topics in Computational
Intelligence",
-
title = "Genetic Programming for Feature Selection Based on
Feature Removal Impact in High-Dimensional Symbolic
Regression",
-
note = "Early access",
-
abstract = "Symbolic regression is increasingly important for
discovering mathematical models for various prediction
tasks. It works by searching for the arithmetic
expressions that best represent a target variable using
a set of input features. However, as the number of
features increases, the search process becomes more
complex. To address high-dimensional symbolic
regression, this work proposes a genetic programming
for feature selection method based on the impact of
feature removal on the performance of SR models. Unlike
existing Shapely value methods that simulate feature
absence at the data level, the proposed approach
suggests removing features at the model level. This
approach circumvents the production of unrealistic data
instances, which is a major limitation of Shapely value
and permutation-based methods. Moreover, after
calculating the importance of the features, a cut-off
strategy, which works by injecting a number of random
features and using their importance to automatically
set a threshold, is proposed for selecting important
features. The experimental results on artificial and
real-world high-dimensional data sets show that,
compared with state-of-the-art feature selection
methods using the permutation importance and Shapely
value, the proposed method not only improves the SR
accuracy but also selects smaller sets of features.",
-
keywords = "genetic algorithms, genetic programming, Feature
extraction, Data models, Computational modelling, Task
analysis, Predictive models, Machine learning, Feature
selection, high dimensionality, symbolic regression",
-
DOI = "doi:10.1109/TETCI.2024.3369407",
-
ISSN = "2471-285X",
-
notes = "Also known as \cite{10466603}",
- }
Genetic Programming entries for
Baligh Al-Helali
Qi Chen
Bing Xue
Mengjie Zhang
Citations