Unbalanced breast cancer data classification using novel fitness functions in genetic programming
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Article{DEVARRIYA:2020:ESA,
-
author = "Divyaansh Devarriya and Cairo Gulati and
Vidhi Mansharamani and Aditi Sakalle and Arpit Bhardwaj",
-
title = "Unbalanced breast cancer data classification using
novel fitness functions in genetic programming",
-
journal = "Expert Systems with Applications",
-
year = "2020",
-
volume = "140",
-
pages = "112866",
-
keywords = "genetic algorithms, genetic programming, Breast
cancer, Unbalanced data, Fitness function",
-
ISSN = "0957-4174",
-
URL = "http://www.sciencedirect.com/science/article/pii/S0957417419305767",
-
DOI = "doi:10.1016/j.eswa.2019.112866",
-
size = "11 pages",
-
abstract = "Breast Cancer is a common disease and to prevent it,
the disease must be identified at earlier stages.
Available breast cancer datasets are unbalanced in
nature, i.e. there are more instances of benign
(non-cancerous) cases then malignant (cancerous) ones.
Therefore, it is a challenging task for most machine
learning (ML) models to classify between benign and
malignant cases properly, even though they have high
accuracy. Accuracy is not a good metric to assess the
results of ML models on breast cancer dataset because
of biased results. To address this issue, we use
Genetic Programming (GP) and propose two fitness
functions. First one is F2 score which focuses on
learning more about the minority class, which contains
more relevant information, the second one is a novel
fitness function known as Distance score (D score)
which learns about both the classes by giving them
equal importance and being unbiased. The GP framework
in which we implemented D score is named as D-score GP
(DGP) and the framework implemented with F2 score is
named as F2GP. The proposed F2GP achieved a maximum
accuracy of 99.63percent, 99.51percent and 100percent
for 60-40, 70-30 partition schemes and 10 fold cross
validation scheme respectively and DGP achieves a
maximum accuracy of 99.63percent, 98.5percent and
100percent in 60-40, 70-30 partition schemes and 10
fold cross validation scheme respectively. The proposed
models also achieves a recall of 100percent for all the
test cases. This shows that using a new fitness
function for unbalanced data classification improves
the performance of a classifier",
- }
Genetic Programming entries for
Divyaansh Devarriya
Cairo Gulati
Vidhi Mansharamani
Aditi Sakalle
Arpit Bhardwaj
Citations