Can–Evo–Ens: Classifier stacking based evolutionary ensemble system for prediction of human breast cancer using amino acid sequences

https://doi.org/10.1016/j.jbi.2015.01.004Get rights and content
Under an Elsevier user license
open archive

Highlights

  • We developed a novel classifier-stacking based evolutionary system for cancer prediction.

  • Improved performance of the prediction system is reported using the optimal threshold obtained through PSO algorithm.

  • We found PseAAC-S feature space as the most informative.

  • Experimental results demonstrate that our cancer prediction system is better than other proposed system.

Abstract

The diagnostic of human breast cancer is an intricate process and specific indicators may produce negative results. In order to avoid misleading results, accurate and reliable diagnostic system for breast cancer is indispensable. Recently, several interesting machine-learning (ML) approaches are proposed for prediction of breast cancer. To this end, we developed a novel classifier stacking based evolutionary ensemble system “Can–Evo–Ens” for predicting amino acid sequences associated with breast cancer. In this paper, first, we selected four diverse-type of ML algorithms of Naïve Bayes, K-Nearest Neighbor, Support Vector Machines, and Random Forest as base-level classifiers. These classifiers are trained individually in different feature spaces using physicochemical properties of amino acids. In order to exploit the decision spaces, the preliminary predictions of base-level classifiers are stacked. Genetic programming (GP) is then employed to develop a meta-classifier that optimal combine the predictions of the base classifiers. The most suitable threshold value of the best-evolved predictor is computed using Particle Swarm Optimization technique. Our experiments have demonstrated the robustness of Can–Evo–Ens system for independent validation dataset. The proposed system has achieved the highest value of Area Under Curve (AUC) of ROC Curve of 99.95% for cancer prediction. The comparative results revealed that proposed approach is better than individual ML approaches and conventional ensemble approaches of AdaBoostM1, Bagging, GentleBoost, and Random Subspace. It is expected that the proposed novel system would have a major impact on the fields of Biomedical, Genomics, Proteomics, Bioinformatics, and Drug Development.

Keywords

Breast cancer
Amino acids
Physicochemical properties
Genetic programming
Stacking ensemble

Cited by (0)