Skip to main content

A Decomposition Based Multi-objective Genetic Programming Algorithm for Classification of Highly Imbalanced Tandem Mass Spectrometry

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12047))

Abstract

Preprocessing tandem mass spectra to classify the signal and noise peaks plays a crucial role for improving the accuracy of most peptide identification algorithms. As a CID tandem mass spectra dataset is highly imbalanced with high noise ratio and a small number of signal peaks (low signal to noise ratio), a classification strategy which is able to maintain the performance trade-off between the minority (signal) and the majority (noise) class accuracies prior to peptide identification is required. Therefore, this paper proposes a Multi-Objective Genetic Programming (MOGP) approach based on the idea of MOEA/D, named MOGP/D, to evolve a Pareto front of classifiers along the optimal trade-off surface that offers the best compromises between objectives. In comparison with an NSGA-II base MOGP method, called NSGP, with decreasing the signal to noise ratio, MOGP/D produces better solutions in the region of interest (centre of the Pareto front) according to the hypervolume indicator on the training sets. Moreover, the best compromise solution achieved by the proposed method is compared with the best single objective GP and the best of NSGP, and the results show that MOGP/D retains a reasonable number of signal peaks and filters more noise peaks compared to the other two methods. To further evaluate the effectiveness of MOGP/D, the preprocessed MS/MS data is submitted to the mostly used de novo sequencing software, PEAKS, to identify the peptides. The results show that the proposed multi-objective GP method improves the reliability of peptide identification compared to the single objective GP.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Sheng, Q., et al.: Preprocessing significantly improves the peptide/protein identification sensitivity of high-resolution isobarically labeled tandem mass spectrometry data. Mol. Cell. Proteomics 14(2), 405–417 (2015)

    Article  Google Scholar 

  2. Azari, S., Zhang, M., Xue, B., Peng, L.: Genetic programming for preprocessing tandem mass spectra to improve the reliability of peptide identification. In: Vellasco, M. (ed.) 2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 8–13 July 2018. IEEE (2018)

    Google Scholar 

  3. Azari, S., Xue, B., Zhang, M., Peng, L.: Preprocessing tandem mass spectra using genetic programming for peptide identification. J. Am. Soc. Mass Spectrom. 30, 1–14 (2019)

    Article  Google Scholar 

  4. Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Reusing genetic programming for ensemble selection in classification of unbalanced data. IEEE Trans. Evol. Comput. 18(6), 893–908 (2013)

    Article  Google Scholar 

  5. Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Trans. Evol. Comput. 17(3), 368–386 (2012)

    Article  Google Scholar 

  6. Nguyen, B.H., Xue, B., Andreae, P., Ishibuchi, H., Zhang, M.: Multiple reference points-based decomposition for multiobjective feature selection in classification: static and dynamic mechanisms. IEEE Trans. Evol. Comput. 1(1), 170–184 (2020). https://doi.org/10.1109/TEVC.2019.2913831

    Article  Google Scholar 

  7. Ma, X., Zhang, Q., Tian, G., Yang, J., Zhu, Z.: On tchebycheff decomposition approaches for multiobjective evolutionary optimization. IEEE Trans. Evol. Comput. 22(2), 226–244 (2017)

    Article  Google Scholar 

  8. Fortin, F.-A., De Rainville, F.-M., Gardner, M.-A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)

    MathSciNet  MATH  Google Scholar 

  9. Wessels, H.J.C.T., et al.: A comprehensive full factorial LC-MS/MS proteomics benchmark data set. Proteomics 12(14), 2276–2281 (2012)

    Article  Google Scholar 

  10. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.A.M.T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    Article  Google Scholar 

  11. Riquelme, N., Von Lücken, C., Baran, B.: Performance metrics in multi-objective optimization. In: 2015 Latin American Computing Conference (CLEI), pp. 1–11. IEEE (2015)

    Google Scholar 

  12. Paul, S., Das, S.: Simultaneous feature selection and weighting-an evolutionary multi-objective optimization approach. Pattern Recogn. Lett. 65, 51–59 (2015)

    Article  Google Scholar 

  13. Ma, B., et al.: PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17(20), 2337–2342 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samaneh Azari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Azari, S., Xue, B., Zhang, M., Peng, L. (2020). A Decomposition Based Multi-objective Genetic Programming Algorithm for Classification of Highly Imbalanced Tandem Mass Spectrometry. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12047. Springer, Cham. https://doi.org/10.1007/978-3-030-41299-9_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41299-9_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41298-2

  • Online ISBN: 978-3-030-41299-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics