Breast cancer diagnosis using Genetically Optimized Neural Network model

doi:10.1016/j.eswa.2015.01.065

Expert Systems with Applications

Volume 42, Issue 10, 15 June 2015, Pages 4611-4620

https://doi.org/10.1016/j.eswa.2015.01.065 Get rights and content

Highlights

•
A Genetically Optimized Neural Network is proposed for Breast Cancer diagnosis.
•
Mapping of GONN to its equivalent Feed Forward Neural Network are shown.
•
GONN produces the highest classification accuracy among other classifiers.

Abstract

One in every eight women is susceptible to breast cancer, at some point of time in her life. Early detection and effective treatment is the only rescue to reduce breast cancer mortality. Accurate classification of a breast cancer tumor is an important task in medical diagnosis. Machine learning techniques are gaining importance in medical diagnosis because of their classification capability. In this paper, we propose a new, Genetically Optimized Neural Network (GONN) algorithm, for solving classification problems. We evolve a neural network genetically to optimize its architecture (structure and weight) for classification. We introduce new crossover and mutation operators which differ from standard crossover and mutation operators to reduce the destructive nature of these operators. We use the GONN algorithm to classify breast cancer tumors as benign or malignant. To demonstrate our results, we had taken the WBCD database from UCI Machine Learning repository and compared the classification accuracy, sensitivity, specificity, confusion matrix, ROC curves and AUC under ROC curves of GONN with classical model and classical back propagation model. Our algorithm gives classification accuracy of 98.24%, 99.63% and 100% for 50–50, 60–40, 70–30 training–testing partition respectively and 100% for 10 fold cross validation. The results show that our approach works well with the breast cancer database and can be a good alternative to the well-known machine learning methods.

Introduction

The cell is the basic biological unit of all living organisms, including humans. A normal cell, in its life cycle, grows in size, during which it collects nutrients, and then divides itself to form two new daughter cells. Cells become cancerous when they lose their ability to stop dividing, to stay where they belong, and to die at the proper time. Such extra cells together form a mass called tumor. The tumor can be identified either as benign or malignant. Breast cancer is a malignant tumor originating from the breast tissue (Muto, Bussey, & Morson, 1975).

Every woman is at risk for breast cancer. If she lives to be 85, there is a one in eight chance (12%) that she will develop breast cancer sometime during her life. As a woman ages, her risk of developing breast cancer rises dramatically regardless of her family history. Treatment and causes for breast cancer are still under research and since there is no widely available preventive measure (Christoyianni et al., 2000, Rodrigues et al., 2006) yet, early detection and effective treatment is the only rescue to reduce breast cancer mortality. Fortunately, if breast cancer is detected accurately in an early stage, localized tumors can be treated successfully before the cancer spreads. Thus, accurate diagnosis of breast cancer is an essential and urgent problem in medical science community.

One of the major task for accurate diagnosis is the extraction of useful knowledge from past diagnosis data. Machine learning techniques enable computers to learn from experience, past patterns and examples (Carbonell, 1983, Witten and Frank, 2005). Thus, use of machine learning tools in medical diagnosis is increasing gradually. Data mining and soft computing techniques have been applied to extract rules and patterns from various datasets (Maimon and Rokach, 2005, Mitra and Hayashi, 2000, Mitra and Acharya, 2005). Some of these techniques (Bellazzi and Zupan, 2008, Marcano-Cedeño et al., 2011, Malmir et al., 2013) have shown very good results in classification problems, which can help medical experts in recognizing diseases. In case of breast cancer diagnosis, a tumor needs to be identified as benign or malignant based on the sample properties. In terms of machine learning, this problem can be approached as a 2-class classification problem based on a set of sample attributes.

A wide range of methods have been proposed to forecast medical diagnosis of breast cancer with WBCD in literature. Quinlan (1996) used 10-fold cross-validation with C4.5 decision tree method and achieved a classification accuracy of 94.74%. Hamilton, Shan, and Cercone (1996) used RIAC method to achieve the accuracy of 94.99%. Nauck and Kruse (1999) used neuron-fuzzy techniques to obtain the accuracy of 95.06%. Pena-Reyes and Sipper (1999) used the fuzzy-GA method and reached a classification accuracy of 97.36%. Albrecht, Lappas, Vinterbo, Wong, and Ohno-Machado (2002) used a combination of perceptron algorithm with simulated annealing and reported accuracy of 98.8%. Abonyi and Szeifert (2003) used supervised fuzzy clustering technique to obtained an accuracy of 95.57%. Polat and Güneş (2007) used artificial immune recognition system (AIRS) and fuzzy resource allocation mechanism to obtain an accuracy of 98.51%. Übeyli (2007) five different classifiers, support vector machine, probabilistic neural network, recurrent neural network, combined neural network and Multilayer Perceptron neural networks, were applied and respective accuracies of 99.54%, 98.61%, 98.15%, 97.40% and 91.92% was obtained. Least Square SVM (LS-SVM) was used by Polat and Güneş (2007) to obtained 98.53% accuracy. Peng, Wu, and Jiang (2010) applied integration of wrapper and filter approaches to obtain an accuracy of 99.5%. Örkcü and Bal (2011) compared the performance of Back Propagation Neural Network (BPNN), Binary coded Genetic Algorithm and Real Coded Genetic Algorithm on the breast cancer database and achieved an accuracy of 93.1%, 94%, and 96.5% respectively. Marcano-Cedeño et al. (2011) presented a new Artificial Metaplasticity Multilayer Perceptron algorithm which performed better than BPNN on the same breast cancer database to give the classification accuracy of 99.26% as compared to BPNN 94.51% for 60/40 training–testing samples. Lavanya and Rani (2011) used decision tree algorithms for the same and achieved 92.97% classification accuracy. Malmir et al. (2013) achieved an accuracy of 97.75% and 97.63% by training a Multilayer Perceptron (MLP) Network for 40 iterations using Imperialist Competitive Algorithm (ICA) and Particle Swarm Optimization (PSO) respectively. Koyuncu and Ceylan (2013) achieved a higher classification accuracy of 98.05% by using 9 classifiers in a Rotation Forest-Artificial Neural Network (RF-ANN). Xue, Zhang, and Browne (2014) presented a Particle swarm optimization technique for feature selection using novel initialization and updating mechanisms PSO (4–2) to obtained an accuracy of 94.74%.

In this study, a Genetically Optimized Neural Network (GONN) model is proposed which simultaneously evolved the structure and weight of neural network for classifying the WBCD breast cancer database as benign or malignant. Our algorithm is inspired by Koza and Rice (1991), to optimize the structure of a neural network by genetic evolution. GONN implements new crossover and mutation operators in GP to eliminate the destructive nature of crossover and mutation operations in a standard GP life-cycle (Koza, 1992). To measure the performance of the proposed algorithm we used the Wisconsin breast cancer dataset from the UCI Machine Learning Repository (Bache & Lichman, 2013). It is observed that the proposed algorithm yielded an accuracy of 98.24%, 99.63%, 100%, for 50–50, 60–40, 70–30 and training–testing partition respectively and classification accuracy of 100% for 10-fold cross validation scheme. Measures such as sensitivity, specificity, ROC curves, Area under the ROC curves (AUC) and Mann–Whitney two tailed test are used to validate the performance. To show the dominance of our approach, we compared our method with a classical Koza and Rice (1991) model, classical Back Propagation Neural Network (BPNN) (Hagan, Martin, & Beale, 1996) and also with recently proposed algorithms applied on the WBCD database. The results show that our approach works well with the breast cancer database and can be a good alternative to the well-known machine learning methods.

Section snippets

Wisconsin breast cancer database description

In this study, we had performed our experiment on WBCD database taken from UCI Machine Learning repository (Bache & Lichman, 2013). The Wisconsin Breast Cancer database consists of 699 instances taken from Fine Needle Aspirates (FNA) of human breast tissue. The dataset has 9 attributes and its class (benign or malignant) corresponding to each record. The value of each attribute listed in Table 1 is an integer value between 1 and 10, the value of 10 indicates the most abnormal state. Out of the

Genetic Programming

Genetic Programming (GP) (Koza, 1992), an evolutionary machine learning approach is inspired by Darwin’s theory of evolution. It initially generates random solutions to solve a problem, and then evolves them based on a fitness function. New and improved individuals are produced by applying reproduction, crossover and mutation operators on individuals of previous generation. Reproduction is an asexual method where in a selected individual copies itself into the new population. It is effectively

Proposed Genetically Optimized Neural Network (GONN)

To evolve the ANN architecture using GP, for solving the classification of WBCD dataset, we have to form a GONN architecture in such a way to treat it like an ANN structure. To build the GONN architecture, we have to follow the steps of GP life cycle with modified crossover and mutation operators. Thus a final GONN architecture represents an ANN with appropriate mapping of GP parameters to ANN learning parameters. The detail description of initialization method, fitness function, proposed

Results and discussion

The proposed GONN as a classifier was implemented in Java (Java SE 6 Update 45) and on a Pentium IV computer of 3.4 GHz with 2 GB of RAM. This algorithm was applied to the Wisconsin breast cancer Database (WBCD). Experimentation is carried out on the dataset with the parameters as described in Table 2.

In machine learning field, it is common to partition the dataset into two separate sets: a training set and a testing set. To evaluate the generalizability of our approach and to compare our work

Conclusion

In this work, a novel approach for breast cancer diagnosis is explored by an Artificial Neural Network which is genetically evolved to an optimal architecture (structure and weight) for classification. This is done by using the concept of Genetic Programming with proposed crossover and mutation operators in which the destructive nature of these operators is eliminated. Also, the suggested changes bring more diversity in the GP population and help the algorithm to reach solution faster with more

References (44)

J. Abonyi et al.
Supervised fuzzy clustering for the identification of fuzzy classifiers
Pattern Recognition Letters
(2003)
P. Barmpalexis et al.
Symbolic regression via genetic programming in the optimization of a controlled release pharmaceutical formulation
Chemometrics and Intelligent Laboratory Systems
(2011)
C.M. Baydar et al.
Automated generation of robust error recovery logic in assembly systems using genetic programming
Journal of Manufacturing Systems
(2001)
R. Bellazzi et al.
Predictive data mining in clinical medicine: current issues and guidelines
International Jorunal of Medical Informatics
(2008)
M. Mahsal Khan et al.
Fast learning neural networks using cartesian genetic programming
Neurocomputing
(2013)
A. Marcano-Cedeño et al.
Wbcd breast cancer database classification applying artificial metaplasticity neural network
Expert Systems with Application
(2011)
D. Nauck et al.
Obtaining interpretable fuzzy classification rules from medical data
Artificial Intelligence in Medicine
(1999)
H.H. Örkcü et al.
Comparing performances of backpropagation and genetic algorithms in the data classification
Expert Systems with Application
(2011)
C.A. Pena-Reyes et al.
A fuzzy-genetic approach to breast cancer diagnosis
Artificial Intelligence in Medicine
(1999)
Y. Peng et al.
A novel feature selection approach for biomedical data classification
Journal of Biomedical Informatics
(2010)

J.L. Pérez et al.

Optimization of existing equations using a new genetic programming algorithm: Application to the shear strength of reinforced concrete beams

Advances in Engineering Software

(2012)

K. Polat et al.

Breast cancer diagnosis using least square support vector machine

Digital Signal Processing

(2007)

D. Rivero et al.

Generation and simplification of artificial neural networks by means of genetic programming

Neurocomputing

(2010)

H.-C. Tsai et al.

Modular neural network programming with genetic optimization

Expert Systems with Applications

(2011)

E.D. Übeyli

Implementing automated diagnostic systems for breast cancer detection

Expert Systems with Applications

(2007)

B. Xue et al.

Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms

Applied Soft Computing

(2014)

A.A. Albrecht et al.

Two applications of the lsa machine

Bache, K., & Lichman, M., 2013 UCI machine learning repository. URL:...

G. Bebis et al.

Feed-forward neural networks

IEEE Potentials

(1994)

J.G. Carbonell

Learning by analogy: Formulating and generalizing plans from past experience

(1983)

I. Christoyianni et al.

Fast detection of masses in computer-aided mammography

IEEE Signal Processing Magazine

(2000)

G.W. Corder et al.

Nonparametric statistics for non-statisticians: A step-by-step approach

(2009)

Cited by (200)

An enhanced and efficient approach for feature selection for chronic human disease prediction: A breast cancer study
2024, Heliyon
Computer-aided diagnosis (CAD) systems play a vital role in modern research by effectively minimizing both time and costs. These systems support healthcare professionals like radiologists in their decision-making process by efficiently detecting abnormalities as well as offering accurate and dependable information. These systems heavily depend on the efficient selection of features to accurately categorize high-dimensional biological data. These features can subsequently assist in the diagnosis of related medical conditions. The task of identifying patterns in biomedical data can be quite challenging due to the presence of numerous irrelevant or redundant features. Therefore, it is crucial to propose and then utilize a feature selection (FS) process in order to eliminate these features. The primary goal of FS approaches is to improve the accuracy of classification by eliminating features that are irrelevant or less informative. The FS phase plays a critical role in attaining optimal results in machine learning (ML)-driven CAD systems. The effectiveness of ML models can be significantly enhanced by incorporating efficient features during the training phase. This empirical study presents a methodology for the classification of biomedical data using the FS technique. The proposed approach incorporates three soft computing-based optimization algorithms, namely Teaching Learning-Based Optimization (TLBO), Elephant Herding Optimization (EHO), and a proposed hybrid algorithm of these two. These algorithms were previously employed; however, their effectiveness in addressing FS issues in predicting human diseases has not been investigated. The following evaluation focuses on the categorization of benign and malignant tumours using the publicly available Wisconsin Diagnostic Breast Cancer (WDBC) benchmark dataset. The five-fold cross-validation technique is employed to mitigate the risk of over-fitting. The evaluation of the proposed approach's proficiency is determined based on several metrics, including sensitivity, specificity, precision, accuracy, area under the receiver-operating characteristic curve (AUC), and F1-score. The best value of accuracy computed through the suggested approach is 97.96%. The proposed clinical decision support system demonstrates a highly favourable classification performance outcome, making it a valuable tool for medical practitioners to utilize as a secondary opinion and reducing the overburden of expert medical practitioners.
A novel enhanced hybrid clinical decision support system for accurate breast cancer prediction
2023, Measurement: Journal of the International Measurement Confederation
Feature selection is one of the crucial data preprocessing techniques for improving the performance of machine learning (ML) models. Recently, metaheuristic feature selection algorithms have become popular because they select optimal features for ML problems. This paper presents three feature selection strategies based on metaheuristic algorithms: Bacterial Foraging (BFOA), Emperor Penguin (EPO), and a hybrid (hBFEPO) combining BFOA and EPO. The baseline algorithms have been investigated for feature selection in other ML tasks, but not for breast cancer classification. A hybrid of these two has been used for the first time. These strategies were initially tested on the COVID-19 dataset. After achieving satisfactory results, these strategies are evaluated on the WDBC Breast Cancer dataset. The performance of our models on WDBC is compared with recent eighteen state-of-the-art studies. The results indicate that the hBFEPO model outperforms other models, achieving 100% precision and specificity, 98.49% accuracy, 95.43% sensitivity, a 95.99% F1-score, and a 99.60% AUC.
Artificial intelligence based medical decision support system for early and accurate breast cancer prediction
2023, Advances in Engineering Software
Feature selection, which picks the optimal subset of characteristics related to the target data by deleting unnecessary data, is one of the most important aspects of the machine learning area. A major part of big data preprocessing is feature selection (reduction). There are 2ⁿ alternative feature subsets for every n features, making it difficult to choose the best set of features from a dataset using typical feature selection techniques. Consequently, the present study proposes and suggests a unique feature selection method based on the Eagle Strategy(ESO) Optimization, Gravitational Search Optimization (GSO) algorithm, and their hybrid algorithm. We chose this infection as our subject of investigation since the number of women with breast cancer is increasing rapidly on a global scale. After lung cancer, which affects more women than any other kind of cancer, breast cancer is the second leading cause of cancer mortality. The goal of this study is to categorize breast cancer into two groups using the benchmark feature set (Wisconsin Diagnostic Breast Cancer (WDBC)) and to choose the fewest features (feature selection) to achieve maximum accuracy. This work also provides a hybrid technique for finding important features that combines two algorithms, ESO and the GSO algorithm, while reducing insignificant characteristics (features) and complexity. Soft computing technologies and machine learning algorithms provide a framework for prognostic research by classifying data instances as relevant or irrelevant depending on cancer severity. Thus, this work presented a new approach for classifying breast cancer tumors. In this research, we coupled soft computing methodologies—our implemented algorithms are applied for the first time to this problem—with artificial intelligence-based machine learning strategies to create a prediction model. The efficacy of our suggested technique was evaluated using WDBC breast cancer data sets, and the findings show that our proposed hybrid algorithm performs very well in breast cancer classification. We have been able to attain astonishing results with accuracy up to 98.9578%, sensitivity up to 0.9705, specificity up to 1.000, precision up to 1.000, F1-score up to 0.9696, and an AUC up to 0.9980 (close to maximum, i.e., 1.0000). Our study's goal is to incorporate our findings into a valid clinical prediction system, allowing visual science specialists to make more accurate and effective judgments in the future. Furthermore, our suggested technology might be used to detect a wide range of diseases.
Laser ultrasonics and machine learning for automatic defect detection in metallic components
2023, NDT and E International
This paper develops an automatic and reliable nondestructive evaluation (NDE) technique that enables quantification of the width and depth of subsurface defects of metallic components simultaneously by using non-contact laser ultrasonic technique and identified machine learning (ML) algorithm. Twenty-two specimens with various subsurface defect dimensions are designed and fabricated for laser ultrasonic experiments, and a total of 220 labeled laser ultrasonic signals are obtained for training and verifying ML models. Twelve features, including four time-domain features (maximum, minimum, peak-to-peak, and |Neg|/Pos value of the laser generated Rayleigh ultrasonic waves) and eight wavelet energy features, are identified and extracted as sensitive feature vectors for establishing the dataset. The principal component analysis (PCA) is implemented as dimensionality reduction method of feature vectors to optimize the recognition algorithm and improve the detection accuracy. Three widely used ML models in NDE, adaptive boosting (Adaboost), extreme gradient boosting (XGBboost), and support vector machine (SVM), combined with the PCA are proposed and compared for detecting both the width and depth of subsurface defects. The PCA-XGBoost achieves the highest recognition rate of 98.48%, and is therefore identified as the most effective approach for analyzing laser-ultrasonic signals. Unlike published reports, the proposed model is trained and evaluated with experimental data covered various classification labels, which is more adaptive and reliable in practical application than the models established using simulated data or limited experimental data. In other applications, as long as sufficient laser ultrasonic data with regards to various defect properties (dimensions, orientations, locations, shapes, etc.) can be acquired, the developed approach can realize accurate detection of corresponding defects.
Essential elements of physical fitness analysis in male adolescent athletes using machine learning
2024, PLoS ONE
A New Optimization Model for MLP Hyperparameter Tuning: Modeling and Resolution by Real-Coded Genetic Algorithm
2024, Neural Processing Letters

View all citing articles on Scopus

View full text

Breast cancer diagnosis using Genetically Optimized Neural Network model

Highlights

Abstract

Introduction

Section snippets

Wisconsin breast cancer database description

Genetic Programming

Proposed Genetically Optimized Neural Network (GONN)

Results and discussion

Conclusion

Pattern Recognition Letters

Chemometrics and Intelligent Laboratory Systems

Journal of Manufacturing Systems

International Jorunal of Medical Informatics

Neurocomputing

Expert Systems with Application

Artificial Intelligence in Medicine

Expert Systems with Application

Artificial Intelligence in Medicine

Journal of Biomedical Informatics

Advances in Engineering Software

Digital Signal Processing

Neurocomputing

Expert Systems with Applications

Expert Systems with Applications

Applied Soft Computing

Two applications of the lsa machine

Feed-forward neural networks

IEEE Potentials

Learning by analogy: Formulating and generalizing plans from past experience

Fast detection of masses in computer-aided mammography

IEEE Signal Processing Magazine

Nonparametric statistics for non-statisticians: A step-by-step approach