Reducing overfitting in genetic programming models for software quality classification
Created by W.Langdon from
gp-bibliography.bib Revision:1.7906
- @InProceedings{liu:2004:rogp,
-
author = "Yi Liu and Taghi Khoshgoftaar",
-
title = "Reducing overfitting in genetic programming models for
software quality classification",
-
booktitle = "Proceedings of the Eighth IEEE Symposium on
International High Assurance Systems Engineering",
-
year = "2004",
-
month = "25-26 " # mar,
-
pages = "56--65",
-
address = "Tampa, Florida, USA",
-
keywords = "genetic algorithms, genetic programming",
-
ISSN = "1530-2059",
-
DOI = "doi:10.1109/HASE.2004.1281730",
-
DOI = "doi:10.1109/HASE.2004.1281730",
-
size = "10 pages",
-
abstract = "A high-assurance system is largely dependent on the
quality of its underlying software. Software quality
models can provide timely estimations of software
quality, allowing the detection and correction of
faults prior to operations. A software metrics-based
quality prediction model may depict overfitting, which
occurs when a prediction model has good accuracy on the
training data but relatively poor accuracy on the test
data. In this paper, we present an approach to address
the overfitting problem in the context of software
quality classification models based on genetic
programming (GP). The overfitting problem has not been
addressed in depth for GP-based models. The general aim
of classifying software modules as fault-prone (fp) and
not fault-prone (nfp) is to aid software management in
expending its limited resources toward improving only
the fp modules. The presence of overfitting in such a
software quality model affects its practical
usefulness, because management is interested in good
performance of the model when applied to unseen data,
i.e., generalisation performance. In the process of
building GP-based software quality classification
models for a high-assurance telecommunications system,
we observed that the GP models were prone to
overfitting. We use a random sampling technique to
reduce overfitting in our GP models. The approach has
been found by many researchers as an effective method
for reducing the time of a GP run. However, in our
study we use random sampling to reduce overfitting with
the aim of improving the generalization capability of
our GP models. A case study of an industrial
high-assurance software system is used to demonstrate
the effectiveness of the random sampling technique.",
-
notes = "HASE 2004",
- }
Genetic Programming entries for
Yi Liu
Taghi M Khoshgoftaar
Citations