Breast cancer diagnosis using Genetically Optimized Neural Network model

https://doi.org/10.1016/j.eswa.2015.01.065Get rights and content

Highlights

Abstract

One in every eight women is susceptible to breast cancer, at some point of time in her life. Early detection and effective treatment is the only rescue to reduce breast cancer mortality. Accurate classification of a breast cancer tumor is an important task in medical diagnosis. Machine learning techniques are gaining importance in medical diagnosis because of their classification capability. In this paper, we propose a new, Genetically Optimized Neural Network (GONN) algorithm, for solving classification problems. We evolve a neural network genetically to optimize its architecture (structure and weight) for classification. We introduce new crossover and mutation operators which differ from standard crossover and mutation operators to reduce the destructive nature of these operators. We use the GONN algorithm to classify breast cancer tumors as benign or malignant. To demonstrate our results, we had taken the WBCD database from UCI Machine Learning repository and compared the classification accuracy, sensitivity, specificity, confusion matrix, ROC curves and AUC under ROC curves of GONN with classical model and classical back propagation model. Our algorithm gives classification accuracy of 98.24%, 99.63% and 100% for 50–50, 60–40, 70–30 training–testing partition respectively and 100% for 10 fold cross validation. The results show that our approach works well with the breast cancer database and can be a good alternative to the well-known machine learning methods.

Introduction

The cell is the basic biological unit of all living organisms, including humans. A normal cell, in its life cycle, grows in size, during which it collects nutrients, and then divides itself to form two new daughter cells. Cells become cancerous when they lose their ability to stop dividing, to stay where they belong, and to die at the proper time. Such extra cells together form a mass called tumor. The tumor can be identified either as benign or malignant. Breast cancer is a malignant tumor originating from the breast tissue (Muto, Bussey, & Morson, 1975).

Every woman is at risk for breast cancer. If she lives to be 85, there is a one in eight chance (12%) that she will develop breast cancer sometime during her life. As a woman ages, her risk of developing breast cancer rises dramatically regardless of her family history. Treatment and causes for breast cancer are still under research and since there is no widely available preventive measure (Christoyianni et al., 2000, Rodrigues et al., 2006) yet, early detection and effective treatment is the only rescue to reduce breast cancer mortality. Fortunately, if breast cancer is detected accurately in an early stage, localized tumors can be treated successfully before the cancer spreads. Thus, accurate diagnosis of breast cancer is an essential and urgent problem in medical science community.

One of the major task for accurate diagnosis is the extraction of useful knowledge from past diagnosis data. Machine learning techniques enable computers to learn from experience, past patterns and examples (Carbonell, 1983, Witten and Frank, 2005). Thus, use of machine learning tools in medical diagnosis is increasing gradually. Data mining and soft computing techniques have been applied to extract rules and patterns from various datasets (Maimon and Rokach, 2005, Mitra and Hayashi, 2000, Mitra and Acharya, 2005). Some of these techniques (Bellazzi and Zupan, 2008, Marcano-Cedeño et al., 2011, Malmir et al., 2013) have shown very good results in classification problems, which can help medical experts in recognizing diseases. In case of breast cancer diagnosis, a tumor needs to be identified as benign or malignant based on the sample properties. In terms of machine learning, this problem can be approached as a 2-class classification problem based on a set of sample attributes.

A wide range of methods have been proposed to forecast medical diagnosis of breast cancer with WBCD in literature. Quinlan (1996) used 10-fold cross-validation with C4.5 decision tree method and achieved a classification accuracy of 94.74%. Hamilton, Shan, and Cercone (1996) used RIAC method to achieve the accuracy of 94.99%. Nauck and Kruse (1999) used neuron-fuzzy techniques to obtain the accuracy of 95.06%. Pena-Reyes and Sipper (1999) used the fuzzy-GA method and reached a classification accuracy of 97.36%. Albrecht, Lappas, Vinterbo, Wong, and Ohno-Machado (2002) used a combination of perceptron algorithm with simulated annealing and reported accuracy of 98.8%. Abonyi and Szeifert (2003) used supervised fuzzy clustering technique to obtained an accuracy of 95.57%. Polat and Güneş (2007) used artificial immune recognition system (AIRS) and fuzzy resource allocation mechanism to obtain an accuracy of 98.51%. Übeyli (2007) five different classifiers, support vector machine, probabilistic neural network, recurrent neural network, combined neural network and Multilayer Perceptron neural networks, were applied and respective accuracies of 99.54%, 98.61%, 98.15%, 97.40% and 91.92% was obtained. Least Square SVM (LS-SVM) was used by Polat and Güneş (2007) to obtained 98.53% accuracy. Peng, Wu, and Jiang (2010) applied integration of wrapper and filter approaches to obtain an accuracy of 99.5%. Örkcü and Bal (2011) compared the performance of Back Propagation Neural Network (BPNN), Binary coded Genetic Algorithm and Real Coded Genetic Algorithm on the breast cancer database and achieved an accuracy of 93.1%, 94%, and 96.5% respectively. Marcano-Cedeño et al. (2011) presented a new Artificial Metaplasticity Multilayer Perceptron algorithm which performed better than BPNN on the same breast cancer database to give the classification accuracy of 99.26% as compared to BPNN 94.51% for 60/40 training–testing samples. Lavanya and Rani (2011) used decision tree algorithms for the same and achieved 92.97% classification accuracy. Malmir et al. (2013) achieved an accuracy of 97.75% and 97.63% by training a Multilayer Perceptron (MLP) Network for 40 iterations using Imperialist Competitive Algorithm (ICA) and Particle Swarm Optimization (PSO) respectively. Koyuncu and Ceylan (2013) achieved a higher classification accuracy of 98.05% by using 9 classifiers in a Rotation Forest-Artificial Neural Network (RF-ANN). Xue, Zhang, and Browne (2014) presented a Particle swarm optimization technique for feature selection using novel initialization and updating mechanisms PSO (4–2) to obtained an accuracy of 94.74%.

In this study, a Genetically Optimized Neural Network (GONN) model is proposed which simultaneously evolved the structure and weight of neural network for classifying the WBCD breast cancer database as benign or malignant. Our algorithm is inspired by Koza and Rice (1991), to optimize the structure of a neural network by genetic evolution. GONN implements new crossover and mutation operators in GP to eliminate the destructive nature of crossover and mutation operations in a standard GP life-cycle (Koza, 1992). To measure the performance of the proposed algorithm we used the Wisconsin breast cancer dataset from the UCI Machine Learning Repository (Bache & Lichman, 2013). It is observed that the proposed algorithm yielded an accuracy of 98.24%, 99.63%, 100%, for 50–50, 60–40, 70–30 and training–testing partition respectively and classification accuracy of 100% for 10-fold cross validation scheme. Measures such as sensitivity, specificity, ROC curves, Area under the ROC curves (AUC) and Mann–Whitney two tailed test are used to validate the performance. To show the dominance of our approach, we compared our method with a classical Koza and Rice (1991) model, classical Back Propagation Neural Network (BPNN) (Hagan, Martin, & Beale, 1996) and also with recently proposed algorithms applied on the WBCD database. The results show that our approach works well with the breast cancer database and can be a good alternative to the well-known machine learning methods.

Section snippets

Wisconsin breast cancer database description

In this study, we had performed our experiment on WBCD database taken from UCI Machine Learning repository (Bache & Lichman, 2013). The Wisconsin Breast Cancer database consists of 699 instances taken from Fine Needle Aspirates (FNA) of human breast tissue. The dataset has 9 attributes and its class (benign or malignant) corresponding to each record. The value of each attribute listed in Table 1 is an integer value between 1 and 10, the value of 10 indicates the most abnormal state. Out of the

Genetic Programming

Genetic Programming (GP) (Koza, 1992), an evolutionary machine learning approach is inspired by Darwin’s theory of evolution. It initially generates random solutions to solve a problem, and then evolves them based on a fitness function. New and improved individuals are produced by applying reproduction, crossover and mutation operators on individuals of previous generation. Reproduction is an asexual method where in a selected individual copies itself into the new population. It is effectively

Proposed Genetically Optimized Neural Network (GONN)

To evolve the ANN architecture using GP, for solving the classification of WBCD dataset, we have to form a GONN architecture in such a way to treat it like an ANN structure. To build the GONN architecture, we have to follow the steps of GP life cycle with modified crossover and mutation operators. Thus a final GONN architecture represents an ANN with appropriate mapping of GP parameters to ANN learning parameters. The detail description of initialization method, fitness function, proposed

Results and discussion

The proposed GONN as a classifier was implemented in Java (Java SE 6 Update 45) and on a Pentium IV computer of 3.4 GHz with 2 GB of RAM. This algorithm was applied to the Wisconsin breast cancer Database (WBCD). Experimentation is carried out on the dataset with the parameters as described in Table 2.

In machine learning field, it is common to partition the dataset into two separate sets: a training set and a testing set. To evaluate the generalizability of our approach and to compare our work

Conclusion

In this work, a novel approach for breast cancer diagnosis is explored by an Artificial Neural Network which is genetically evolved to an optimal architecture (structure and weight) for classification. This is done by using the concept of Genetic Programming with proposed crossover and mutation operators in which the destructive nature of these operators is eliminated. Also, the suggested changes bring more diversity in the GP population and help the algorithm to reach solution faster with more

References (44)

  • J.L. Pérez et al.

    Optimization of existing equations using a new genetic programming algorithm: Application to the shear strength of reinforced concrete beams

    Advances in Engineering Software

    (2012)
  • K. Polat et al.

    Breast cancer diagnosis using least square support vector machine

    Digital Signal Processing

    (2007)
  • D. Rivero et al.

    Generation and simplification of artificial neural networks by means of genetic programming

    Neurocomputing

    (2010)
  • H.-C. Tsai et al.

    Modular neural network programming with genetic optimization

    Expert Systems with Applications

    (2011)
  • E.D. Übeyli

    Implementing automated diagnostic systems for breast cancer detection

    Expert Systems with Applications

    (2007)
  • B. Xue et al.

    Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms

    Applied Soft Computing

    (2014)
  • A.A. Albrecht et al.

    Two applications of the lsa machine

  • Bache, K., & Lichman, M., 2013 UCI machine learning repository. URL:...
  • G. Bebis et al.

    Feed-forward neural networks

    IEEE Potentials

    (1994)
  • J.G. Carbonell

    Learning by analogy: Formulating and generalizing plans from past experience

    (1983)
  • I. Christoyianni et al.

    Fast detection of masses in computer-aided mammography

    IEEE Signal Processing Magazine

    (2000)
  • G.W. Corder et al.

    Nonparametric statistics for non-statisticians: A step-by-step approach

    (2009)
  • Cited by (197)

    • A novel enhanced hybrid clinical decision support system for accurate breast cancer prediction

      2023, Measurement: Journal of the International Measurement Confederation
    View all citing articles on Scopus
    View full text