Towards non-data-hungry and fully-automated diagnosis of breast cancer from mammographic images
Introduction
Breast cancer is the most common form of cancer among women. According to the World Health Organisation, it causes about 15% of cancer deaths. Mammography is the first common standard for routine screenings for breast cancer since it is a fast and affordable technique. It aims to reduce the mortality by detecting breast cancer at an early stage even before woman feel the symptoms. In this stage the breast cancer is easily treatable and the risk of fatality is low. However, according to radiologists, there are some shortcomings faced when mammography is used as the only radiological tool in order to assess a patient's risk for breast cancer. Indeed, cancer cell tissue at an early stage is difficult to differentiate from breast tissue. The presence of dense breast tissues (parenchymal tissue) in the breasts of some patients complicates the expert diagnosis, which results in false negative diagnoses of mammograms for those patients having dense breasts. Therefore, Computer-Aided Diagnosis (CAD) systems are very useful for giving radiologists a second opinion to take decisions swiftly. The decision support provided by CAD systems can be explicit classification or Content-Based Mammogram Retrieval (CBMR). Indeed, differently from automatic classification tools, CBMR provides a sorted set of similar images with a confirmed diagnosis relative to a given mammogram lesion. Retrieved images can serve for supporting radiologists in the diagnosis as well as for educational purposes, while providing more explainable results. Both classification and CBMR can follow the same pipeline for feature extraction, selection and fusion. Furthermore, the diagnosis of breast cancer from mammograms can be seen as a problem of texture classification where the CAD system should differentiate between texture of normal tissues and that of cancerous tissues. Thus, various texture classification techniques have been proposed [1,2]. Some works tried to use descriptors which are known to be efficient in texture classification, notably the Local Binary Pattern (LBP) descriptor, in order to classify breast tissues as cancerous or normal [3,4]. LBP descriptor uses the signs of the differences between a pixel and its immediate neighbors in order to represent the texture locally [5]. This encoding, which exploits the signs of the differences while ignoring the magnitudes, has shown very good results for classifying textures [6]. However, the problem with mammograms is that the textures of normal tissues and cancerous tissues are hardly distinguishable. As shown in Fig. 1, the LBP transforms of three Regions Of Interest (ROI) representing benign, malignant and normal tissue as well as the corresponding LBP histograms do not represent any specific difference for one class compared to another. Therefore, using a technique that has been proven in texture classification may not be sufficient for an accurate diagnosis of breast cancer from mammograms [3]. In fact, as much local information as possible must be exploited to distinguish two fairly similar texture classes [7]. In addition, the global feature vector must be constructed in such a way as to retain information that highlights the difference between malignant tissues and normal ones [8]. Therefore, the way to aggregate the local information to construct the global tissue description should be automated. Indeed, classical concatenation of local representation, commonly used with handcrafted features, can lead to information loss [9]. The goal of this work is to propose a different way for texture analysis for more accurate breast cancer diagnosis from mammogram ROIs. This can be accomplished by acting differently when representing texture locally and globally [10]. Locally, it relates to finding an efficient way to encode as much local texture information as possible without exploding dimensionality. Globally, the objective is to automate the construction of the overall texture description so that the characteristics discriminating malignant and normal tissues can be preserved. Thus, in order to enhance accuracy for CAD techniques for breast cancer diagnosis, we investigate the use of an LBP-like texture representation that uses both signs and magnitudes of the differences between the central pixel and its neighbors in order to describe tissue locally. We propose to learn automatically a texture descriptor that uses this local texture representation to generate an overall feature. For this purpose, genetic programming techniques are investigated to generate a descriptor that produces discriminative features in order to facilitate the task of a supervised classifier for taking reliable decisions. The research gap filled by the suggested method lies in the proposition of an evolutionary context-aware descriptor able to select automatically the most appropriate and restrictive set of features before fusing them depending on the specificity of the classification context without the need for a pre-processing step. Thus, the main contributions of this study are threefold:
- ●
To the best of our knowledge, we are the first to propose an evolutionary-based descriptor for generating robust features to diagnosis breast cancer from mammogram ROI images. Moreover, a fitness function based on the Fisher Separation Criteria (FSC) has been proposed in order to learn descriptors that generate the most discriminative features considering intra-class as well as inter-class characteristics.
- ●
We represent ROI texture locally using LBP-like difference of gray level magnitudes. But unlike the LBP operator, the proposed descriptor uses both magnitudes and signs to generate the feature vector.
- ●
The suggested method is fully-automated but, unlike deep learning-based methods, it does not need a large training set to perform accurate classification and retrieval.
The rest of this paper is organized as follows. In Section 2, we present a brief literature review on relevant existing methods for texture-based methods of cancer diagnosis from mammographic images. The proposed method is detailed in Section 3. The results are presented in Section 4. Outcomes and findings of the suggested method are discussed in Section 5. In Section 6, we conclude the proposed work and present some ideas for future studies.
Section snippets
Related work
Texture is the most important visual cue for describing breast tissues from mammography ROIs, since it illustrates discriminative information about tissue property. The extracted textural features from mammography images can be categorized into handcrafted features and deep learning-based features. On the one hand, within the framework of handcrafted features widely used for breast cancer CAD, features can be categorized into classical texture, curvlet-based, nature-inspired and
Proposed method
The main contribution of the suggested method lies in the automation of the feature extraction step while using few training data. The same pipeline used to classify ROIs is also used for the CBMR with the only difference that for the latter, the similar ROIs are retrieved without giving an explicit decision. In this section, the overall process for malignant vs. benign mammogram ROIs classification/retrieval is detailed. It is worth noting that the same process followed to classify/retrieve
Experimental results
In this section, the proposed method is firstly evaluated for ROI mammograms retrieval. Then, the effectiveness of the suggested method is tested, and compared to relevant state-of-the-art methods, within the framework of classifying breast lesions into normal vs. abnormal as well as into malignant vs. benign. It is worthy mentioning that the proposed method has been implemented using the Anaconda Python distribution and the DEAP1
Discussion
This study deals with the problem of automating feature extraction for breast cancer diagnosis. A genetic programming-based descriptor is learned in order to capture discriminative information locally before generating a global feature. The suggested method is convenient for the classification as well as for CBMR issues. Indeed, all the ROIs used for the training with the corresponding labels and feature vectors are incorporated into a knowledge base during the learning process. Classification
Conclusion and future work
In this work, a fully automated method for local feature extraction and global feature generation for mammogram ROIs is proposed. To achieve this end, an LBP-like local representation is proposed, and a genetic programming-based descriptor is designed to transform local features into a global one. The evolutionary process is based upon a fitness function that guarantees the discriminative power of the descriptor while using small training instances. The suggested method has given encouraging
Declaration of competing interest
There is no conflict of interest.
References (47)
- et al.
A textural approach for mass false positive reduction in mammography
Comput. Med. Imag. Graph.
(2009) - et al.
Breast cancer diagnosis in digitized mammograms using curvelet moments
Comput. Biol. Med.
(2015) A genetic programming-based feature selection and fusion for facial expression recognition
Appl. Soft Comput.
(2021)- et al.
Genetic programming-based learning of texture classification descriptors from local edge signature
Expert Syst. Appl.
(2020) - et al.
Classification of mammogram for early detection of breast cancer using svm classifier and hough transform
Measurement
(2019) - et al.
Automated diagnosis of breast cancer using parameter optimized kernel extreme learning machine
Biomed. Signal Process Control
(2020) - et al.
A comparison of wavelet and curvelet for breast cancer diagnosis in digital mammogram
Comput. Biol. Med.
(2010) - et al.
A statistical based feature extraction method for breast cancer diagnosis in digital mammogram using multiresolution representation
Comput. Biol. Med.
(2012) - et al.
Wdo optimized detection for mammographic masses and its diagnosis: a unified cad system
Appl. Soft Comput.
(2021) - et al.
Malignant and nonmalignant classification of breast lesions in mammograms using convolutional neural networks
Biomed. Signal Process Control
(2021)
A framework for breast cancer classification using multi-dcnns
Comput. Biol. Med.
Multiresolution mammogram analysis in multilevel decomposition
Pattern Recogn. Lett.
Improvement of bias and generalizability for computer-aided diagnostic schemes
Comput. Med. Imag. Graph.
Deep learning in mammography images segmentation and classification: automated cnn approach
Alexandria Eng. J.
Attractive-and-repulsive center-symmetric local binary patterns for texture classification
Eng. Appl. Artif. Intell.
Effective detection of mass abnormalities and its classification using multi-svm classifier with digital mammogram images
Breast cancer mass detection in mammograms using gray difference weight and mser detector
SN Comput. Sci.
Lbp features for breast cancer detection
Multiresolution gray-scale and rotation invariant texture classification with local binary patterns
Pattern Anal. Mach. Intell. IEEE Trans.
Genetic programming-based fusion of hog and lbp features for fully automated texture classification
Vis. Comput.
Evolutionary-based generation of rotation and scale invariant texture descriptors from sift keypoints
Evol. Syst.
Early stage breast cancer detection system using glcm feature extraction and k-nearest neighbor (k-nn) on mammography image
Computer-aided detection system for breast cancer based on gmm and svm
Arab J. Nucl. Sci. Appl.
Cited by (3)
Data Augmentation for Genetic Programming-Driven Late Merging of HOG and Uniform LBP Features for Texture Classification
2023, Vietnam Journal of Computer ScienceThe application of traditional machine learning and deep learning techniques in mammography: a review
2023, Frontiers in Oncology