Breast cancer diagnosis using Genetically Optimized Neural Network model
Introduction
The cell is the basic biological unit of all living organisms, including humans. A normal cell, in its life cycle, grows in size, during which it collects nutrients, and then divides itself to form two new daughter cells. Cells become cancerous when they lose their ability to stop dividing, to stay where they belong, and to die at the proper time. Such extra cells together form a mass called tumor. The tumor can be identified either as benign or malignant. Breast cancer is a malignant tumor originating from the breast tissue (Muto, Bussey, & Morson, 1975).
Every woman is at risk for breast cancer. If she lives to be 85, there is a one in eight chance (12%) that she will develop breast cancer sometime during her life. As a woman ages, her risk of developing breast cancer rises dramatically regardless of her family history. Treatment and causes for breast cancer are still under research and since there is no widely available preventive measure (Christoyianni et al., 2000, Rodrigues et al., 2006) yet, early detection and effective treatment is the only rescue to reduce breast cancer mortality. Fortunately, if breast cancer is detected accurately in an early stage, localized tumors can be treated successfully before the cancer spreads. Thus, accurate diagnosis of breast cancer is an essential and urgent problem in medical science community.
One of the major task for accurate diagnosis is the extraction of useful knowledge from past diagnosis data. Machine learning techniques enable computers to learn from experience, past patterns and examples (Carbonell, 1983, Witten and Frank, 2005). Thus, use of machine learning tools in medical diagnosis is increasing gradually. Data mining and soft computing techniques have been applied to extract rules and patterns from various datasets (Maimon and Rokach, 2005, Mitra and Hayashi, 2000, Mitra and Acharya, 2005). Some of these techniques (Bellazzi and Zupan, 2008, Marcano-Cedeño et al., 2011, Malmir et al., 2013) have shown very good results in classification problems, which can help medical experts in recognizing diseases. In case of breast cancer diagnosis, a tumor needs to be identified as benign or malignant based on the sample properties. In terms of machine learning, this problem can be approached as a 2-class classification problem based on a set of sample attributes.
A wide range of methods have been proposed to forecast medical diagnosis of breast cancer with WBCD in literature. Quinlan (1996) used 10-fold cross-validation with C4.5 decision tree method and achieved a classification accuracy of 94.74%. Hamilton, Shan, and Cercone (1996) used RIAC method to achieve the accuracy of 94.99%. Nauck and Kruse (1999) used neuron-fuzzy techniques to obtain the accuracy of 95.06%. Pena-Reyes and Sipper (1999) used the fuzzy-GA method and reached a classification accuracy of 97.36%. Albrecht, Lappas, Vinterbo, Wong, and Ohno-Machado (2002) used a combination of perceptron algorithm with simulated annealing and reported accuracy of 98.8%. Abonyi and Szeifert (2003) used supervised fuzzy clustering technique to obtained an accuracy of 95.57%. Polat and Güneş (2007) used artificial immune recognition system (AIRS) and fuzzy resource allocation mechanism to obtain an accuracy of 98.51%. Übeyli (2007) five different classifiers, support vector machine, probabilistic neural network, recurrent neural network, combined neural network and Multilayer Perceptron neural networks, were applied and respective accuracies of 99.54%, 98.61%, 98.15%, 97.40% and 91.92% was obtained. Least Square SVM (LS-SVM) was used by Polat and Güneş (2007) to obtained 98.53% accuracy. Peng, Wu, and Jiang (2010) applied integration of wrapper and filter approaches to obtain an accuracy of 99.5%. Örkcü and Bal (2011) compared the performance of Back Propagation Neural Network (BPNN), Binary coded Genetic Algorithm and Real Coded Genetic Algorithm on the breast cancer database and achieved an accuracy of 93.1%, 94%, and 96.5% respectively. Marcano-Cedeño et al. (2011) presented a new Artificial Metaplasticity Multilayer Perceptron algorithm which performed better than BPNN on the same breast cancer database to give the classification accuracy of 99.26% as compared to BPNN 94.51% for 60/40 training–testing samples. Lavanya and Rani (2011) used decision tree algorithms for the same and achieved 92.97% classification accuracy. Malmir et al. (2013) achieved an accuracy of 97.75% and 97.63% by training a Multilayer Perceptron (MLP) Network for 40 iterations using Imperialist Competitive Algorithm (ICA) and Particle Swarm Optimization (PSO) respectively. Koyuncu and Ceylan (2013) achieved a higher classification accuracy of 98.05% by using 9 classifiers in a Rotation Forest-Artificial Neural Network (RF-ANN). Xue, Zhang, and Browne (2014) presented a Particle swarm optimization technique for feature selection using novel initialization and updating mechanisms PSO (4–2) to obtained an accuracy of 94.74%.
In this study, a Genetically Optimized Neural Network (GONN) model is proposed which simultaneously evolved the structure and weight of neural network for classifying the WBCD breast cancer database as benign or malignant. Our algorithm is inspired by Koza and Rice (1991), to optimize the structure of a neural network by genetic evolution. GONN implements new crossover and mutation operators in GP to eliminate the destructive nature of crossover and mutation operations in a standard GP life-cycle (Koza, 1992). To measure the performance of the proposed algorithm we used the Wisconsin breast cancer dataset from the UCI Machine Learning Repository (Bache & Lichman, 2013). It is observed that the proposed algorithm yielded an accuracy of 98.24%, 99.63%, 100%, for 50–50, 60–40, 70–30 and training–testing partition respectively and classification accuracy of 100% for 10-fold cross validation scheme. Measures such as sensitivity, specificity, ROC curves, Area under the ROC curves (AUC) and Mann–Whitney two tailed test are used to validate the performance. To show the dominance of our approach, we compared our method with a classical Koza and Rice (1991) model, classical Back Propagation Neural Network (BPNN) (Hagan, Martin, & Beale, 1996) and also with recently proposed algorithms applied on the WBCD database. The results show that our approach works well with the breast cancer database and can be a good alternative to the well-known machine learning methods.
Section snippets
Wisconsin breast cancer database description
In this study, we had performed our experiment on WBCD database taken from UCI Machine Learning repository (Bache & Lichman, 2013). The Wisconsin Breast Cancer database consists of 699 instances taken from Fine Needle Aspirates (FNA) of human breast tissue. The dataset has 9 attributes and its class (benign or malignant) corresponding to each record. The value of each attribute listed in Table 1 is an integer value between 1 and 10, the value of 10 indicates the most abnormal state. Out of the
Genetic Programming
Genetic Programming (GP) (Koza, 1992), an evolutionary machine learning approach is inspired by Darwin’s theory of evolution. It initially generates random solutions to solve a problem, and then evolves them based on a fitness function. New and improved individuals are produced by applying reproduction, crossover and mutation operators on individuals of previous generation. Reproduction is an asexual method where in a selected individual copies itself into the new population. It is effectively
Proposed Genetically Optimized Neural Network (GONN)
To evolve the ANN architecture using GP, for solving the classification of WBCD dataset, we have to form a GONN architecture in such a way to treat it like an ANN structure. To build the GONN architecture, we have to follow the steps of GP life cycle with modified crossover and mutation operators. Thus a final GONN architecture represents an ANN with appropriate mapping of GP parameters to ANN learning parameters. The detail description of initialization method, fitness function, proposed
Results and discussion
The proposed GONN as a classifier was implemented in Java (Java SE 6 Update 45) and on a Pentium IV computer of 3.4 GHz with 2 GB of RAM. This algorithm was applied to the Wisconsin breast cancer Database (WBCD). Experimentation is carried out on the dataset with the parameters as described in Table 2.
In machine learning field, it is common to partition the dataset into two separate sets: a training set and a testing set. To evaluate the generalizability of our approach and to compare our work
Conclusion
In this work, a novel approach for breast cancer diagnosis is explored by an Artificial Neural Network which is genetically evolved to an optimal architecture (structure and weight) for classification. This is done by using the concept of Genetic Programming with proposed crossover and mutation operators in which the destructive nature of these operators is eliminated. Also, the suggested changes bring more diversity in the GP population and help the algorithm to reach solution faster with more
References (44)
- et al.
Supervised fuzzy clustering for the identification of fuzzy classifiers
Pattern Recognition Letters
(2003) - et al.
Symbolic regression via genetic programming in the optimization of a controlled release pharmaceutical formulation
Chemometrics and Intelligent Laboratory Systems
(2011) - et al.
Automated generation of robust error recovery logic in assembly systems using genetic programming
Journal of Manufacturing Systems
(2001) - et al.
Predictive data mining in clinical medicine: current issues and guidelines
International Jorunal of Medical Informatics
(2008) - et al.
Fast learning neural networks using cartesian genetic programming
Neurocomputing
(2013) - et al.
Wbcd breast cancer database classification applying artificial metaplasticity neural network
Expert Systems with Application
(2011) - et al.
Obtaining interpretable fuzzy classification rules from medical data
Artificial Intelligence in Medicine
(1999) - et al.
Comparing performances of backpropagation and genetic algorithms in the data classification
Expert Systems with Application
(2011) - et al.
A fuzzy-genetic approach to breast cancer diagnosis
Artificial Intelligence in Medicine
(1999) - et al.
A novel feature selection approach for biomedical data classification
Journal of Biomedical Informatics
(2010)