Towards non-data-hungry and fully-automated diagnosis of breast cancer from mammographic images

https://doi.org/10.1016/j.compbiomed.2021.105011Get rights and content

Highlights

  • This study deals with the problem of automating feature extraction for breast cancer diagnosis.

  • A fully-automated method, for breast cancer diagnosis, performs training using few instances.

  • A GP-based descriptor extracts features using statistics on an LBP-like local distribution.

  • The method is validated for CBIR as well as for abnormality/malignancy classification.

Abstract

Analysing local texture and generating features are two key issues for automatic cancer detection in mammographic images. Recent researches have shown that deep neural networks provide a promising alternative to hand-driven features which suffer from curse of dimensionality and low accuracy rates. However, large and balanced training data are foremost requirements for deep learning-based models and these data are not always available publicly. In this work, we propose a fully-automated method for breast cancer diagnosis that performs training using small sets of data. Feature extraction from mammographic images is performed using a genetic-programming-based descriptor that exploits statistics on a local binary pattern-like local distribution defined in each pixel. The effectiveness of the suggested method is demonstrated on two challenging datasets, (1) the digital database for screening mammography and (2) the mammographic image analysis society digital mammogram database, for content-based image retrieval as well as for abnormality/malignancy classification. The experimental results show that the proposed method outperforms or achieves comparable results with deep learning-based methods even those with transfer learning and/or data-augmentation.

Introduction

Breast cancer is the most common form of cancer among women. According to the World Health Organisation, it causes about 15% of cancer deaths. Mammography is the first common standard for routine screenings for breast cancer since it is a fast and affordable technique. It aims to reduce the mortality by detecting breast cancer at an early stage even before woman feel the symptoms. In this stage the breast cancer is easily treatable and the risk of fatality is low. However, according to radiologists, there are some shortcomings faced when mammography is used as the only radiological tool in order to assess a patient's risk for breast cancer. Indeed, cancer cell tissue at an early stage is difficult to differentiate from breast tissue. The presence of dense breast tissues (parenchymal tissue) in the breasts of some patients complicates the expert diagnosis, which results in false negative diagnoses of mammograms for those patients having dense breasts. Therefore, Computer-Aided Diagnosis (CAD) systems are very useful for giving radiologists a second opinion to take decisions swiftly. The decision support provided by CAD systems can be explicit classification or Content-Based Mammogram Retrieval (CBMR). Indeed, differently from automatic classification tools, CBMR provides a sorted set of similar images with a confirmed diagnosis relative to a given mammogram lesion. Retrieved images can serve for supporting radiologists in the diagnosis as well as for educational purposes, while providing more explainable results. Both classification and CBMR can follow the same pipeline for feature extraction, selection and fusion. Furthermore, the diagnosis of breast cancer from mammograms can be seen as a problem of texture classification where the CAD system should differentiate between texture of normal tissues and that of cancerous tissues. Thus, various texture classification techniques have been proposed [1,2]. Some works tried to use descriptors which are known to be efficient in texture classification, notably the Local Binary Pattern (LBP) descriptor, in order to classify breast tissues as cancerous or normal [3,4]. LBP descriptor uses the signs of the differences between a pixel and its immediate neighbors in order to represent the texture locally [5]. This encoding, which exploits the signs of the differences while ignoring the magnitudes, has shown very good results for classifying textures [6]. However, the problem with mammograms is that the textures of normal tissues and cancerous tissues are hardly distinguishable. As shown in Fig. 1, the LBP transforms of three Regions Of Interest (ROI) representing benign, malignant and normal tissue as well as the corresponding LBP histograms do not represent any specific difference for one class compared to another. Therefore, using a technique that has been proven in texture classification may not be sufficient for an accurate diagnosis of breast cancer from mammograms [3]. In fact, as much local information as possible must be exploited to distinguish two fairly similar texture classes [7]. In addition, the global feature vector must be constructed in such a way as to retain information that highlights the difference between malignant tissues and normal ones [8]. Therefore, the way to aggregate the local information to construct the global tissue description should be automated. Indeed, classical concatenation of local representation, commonly used with handcrafted features, can lead to information loss [9]. The goal of this work is to propose a different way for texture analysis for more accurate breast cancer diagnosis from mammogram ROIs. This can be accomplished by acting differently when representing texture locally and globally [10]. Locally, it relates to finding an efficient way to encode as much local texture information as possible without exploding dimensionality. Globally, the objective is to automate the construction of the overall texture description so that the characteristics discriminating malignant and normal tissues can be preserved. Thus, in order to enhance accuracy for CAD techniques for breast cancer diagnosis, we investigate the use of an LBP-like texture representation that uses both signs and magnitudes of the differences between the central pixel and its neighbors in order to describe tissue locally. We propose to learn automatically a texture descriptor that uses this local texture representation to generate an overall feature. For this purpose, genetic programming techniques are investigated to generate a descriptor that produces discriminative features in order to facilitate the task of a supervised classifier for taking reliable decisions. The research gap filled by the suggested method lies in the proposition of an evolutionary context-aware descriptor able to select automatically the most appropriate and restrictive set of features before fusing them depending on the specificity of the classification context without the need for a pre-processing step. Thus, the main contributions of this study are threefold:

  • To the best of our knowledge, we are the first to propose an evolutionary-based descriptor for generating robust features to diagnosis breast cancer from mammogram ROI images. Moreover, a fitness function based on the Fisher Separation Criteria (FSC) has been proposed in order to learn descriptors that generate the most discriminative features considering intra-class as well as inter-class characteristics.

  • We represent ROI texture locally using LBP-like difference of gray level magnitudes. But unlike the LBP operator, the proposed descriptor uses both magnitudes and signs to generate the feature vector.

  • The suggested method is fully-automated but, unlike deep learning-based methods, it does not need a large training set to perform accurate classification and retrieval.

The rest of this paper is organized as follows. In Section 2, we present a brief literature review on relevant existing methods for texture-based methods of cancer diagnosis from mammographic images. The proposed method is detailed in Section 3. The results are presented in Section 4. Outcomes and findings of the suggested method are discussed in Section 5. In Section 6, we conclude the proposed work and present some ideas for future studies.

Section snippets

Related work

Texture is the most important visual cue for describing breast tissues from mammography ROIs, since it illustrates discriminative information about tissue property. The extracted textural features from mammography images can be categorized into handcrafted features and deep learning-based features. On the one hand, within the framework of handcrafted features widely used for breast cancer CAD, features can be categorized into classical texture, curvlet-based, nature-inspired and

Proposed method

The main contribution of the suggested method lies in the automation of the feature extraction step while using few training data. The same pipeline used to classify ROIs is also used for the CBMR with the only difference that for the latter, the similar ROIs are retrieved without giving an explicit decision. In this section, the overall process for malignant vs. benign mammogram ROIs classification/retrieval is detailed. It is worth noting that the same process followed to classify/retrieve

Experimental results

In this section, the proposed method is firstly evaluated for ROI mammograms retrieval. Then, the effectiveness of the suggested method is tested, and compared to relevant state-of-the-art methods, within the framework of classifying breast lesions into normal vs. abnormal as well as into malignant vs. benign. It is worthy mentioning that the proposed method has been implemented using the Anaconda Python distribution and the DEAP1

Discussion

This study deals with the problem of automating feature extraction for breast cancer diagnosis. A genetic programming-based descriptor is learned in order to capture discriminative information locally before generating a global feature. The suggested method is convenient for the classification as well as for CBMR issues. Indeed, all the ROIs used for the training with the corresponding labels and feature vectors are incorporated into a knowledge base during the learning process. Classification

Conclusion and future work

In this work, a fully automated method for local feature extraction and global feature generation for mammogram ROIs is proposed. To achieve this end, an LBP-like local representation is proposed, and a genetic programming-based descriptor is designed to transform local features into a global one. The evolutionary process is based upon a fitness function that guarantees the discriminative power of the descriptor while using small training instances. The suggested method has given encouraging

Declaration of competing interest

There is no conflict of interest.

References (47)

  • D.A. Ragab et al.

    A framework for breast cancer classification using multi-dcnns

    Comput. Biol. Med.

    (2021)
  • E.A. Rashed et al.

    Multiresolution mammogram analysis in multilevel decomposition

    Pattern Recogn. Lett.

    (2007)
  • Q. Li

    Improvement of bias and generalizability for computer-aided diagnostic schemes

    Comput. Med. Imag. Graph.

    (2007)
  • W.M. Salama et al.

    Deep learning in mammography images segmentation and classification: automated cnn approach

    Alexandria Eng. J.

    (2021)
  • Y. El merabet et al.

    Attractive-and-repulsive center-symmetric local binary patterns for texture classification

    Eng. Appl. Artif. Intell.

    (2019)
  • G.R. Jothilakshmi et al.

    Effective detection of mass abnormalities and its classification using multi-svm classifier with digital mammogram images

  • B.V. Divyashree et al.

    Breast cancer mass detection in mammograms using gray difference weight and mser detector

    SN Comput. Sci.

    (2021)
  • P. Král et al.

    Lbp features for breast cancer detection

  • T. Ojala et al.

    Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

    Pattern Anal. Mach. Intell. IEEE Trans.

    (2002)
  • M. Hazgui et al.

    Genetic programming-based fusion of hog and lbp features for fully automated texture classification

    Vis. Comput.

    (2021)
  • M. Hazgui et al.

    Evolutionary-based generation of rotation and scale invariant texture descriptors from sift keypoints

    Evol. Syst.

    (2021)
  • T.T. Htay et al.

    Early stage breast cancer detection system using glcm feature extraction and k-nearest neighbor (k-nn) on mammography image

  • A.A.A. Arafa et al.

    Computer-aided detection system for breast cancer based on gmm and svm

    Arab J. Nucl. Sci. Appl.

    (2019)
  • View full text