Firm failure prediction using genetic programming generated features
Created by W.Langdon from
gp-bibliography.bib Revision:1.8129
- @Article{ZELENKOV:2024:eswa,
-
author = "Yuri Zelenkov",
-
title = "Firm failure prediction using genetic programming
generated features",
-
journal = "Expert Systems with Applications",
-
volume = "249",
-
pages = "123839",
-
year = "2024",
-
ISSN = "0957-4174",
-
DOI = "doi:10.1016/j.eswa.2024.123839",
-
URL = "https://www.sciencedirect.com/science/article/pii/S095741742400705X",
-
keywords = "genetic algorithms, genetic programming, Firm failure
prediction, Genetic programming generated feature,
Fitness function, Score of generated features,
Unbalanced data",
-
abstract = "Many studies on predicting firm failure have focused
on finding new features that improve the accuracy of
the models. In this paper, genetic programming (GP) is
used for this purpose. The main problem in GP is to
specify a function that evaluates the fitness of the
feature. Direct optimization of a machine learning (ML)
model that uses a generated feature in most cases leads
to high computational costs since evolving a population
of N programs over G generations while evaluating each
model using K-fold cross-validation requires N*G*K
model learning cycles. Thus, many researchers use
scores that measure the relationship of the generated
features to the class label. However, our empirical
analysis shows that most such scores correlate poorly
with ML model performance. The novelty of our work is
that we introduce several ways of combining different
scores into a single measure of expected model
performance. Experimental results on data from
Hungarian firms (7167 observations, class imbalance
9.37) using five ML models (Logistic Regression, Random
Forest, Gradient Boosting, Histogram Boosting, and
AdaBoost) prove that the proposed way of setting the
fitness function increases the ROC AUC of the listed
models by 6.6percent, 5.2percent, 6.8percent,
5.5percent and 5.2percent respectively. Moreover, by
applying the found formula to the data from Czech firms
(3872 observations, class imbalance of 74.92), which
were not used for the feature search, we obtained
increases in ROC AUC by 13.1percent, 11.8percent,
14.9percent, 9.9percent, and 8.2percent, respectively.
This indicates that the proposed method allows to find
universal features, which opens the way to build
effective models in case of insufficient data (small
number of observations, extreme imbalance, etc.)",
- }
Genetic Programming entries for
Yuri Zelenkov
Citations