Skip to main content

Genetic Programming for Biomarker Detection in Mass Spectrometry Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7691))

Abstract

Classification of mass spectrometry (MS) data is an essential step for biomarker detection which can help in diagnosis and prognosis of diseases. However, due to the high dimensionality and the small sample size, classification of MS data is very challenging. The process of biomarker detection can be referred to as feature selection and classification in terms of machine learning. Genetic programming (GP) has been widely used for classification and feature selection, but it has not been effectively applied to biomarker detection in the MS data. In this study we develop a GP based approach to feature selection, feature extraction and classification of mass spectrometry data for biomarker detection. In this approach, we firstly use GP to reduce the “redundant” features by selecting a small number of important features and constructing high-level features, then we use GP to classify the data based on selected features and constructed features. This approach is examined and compared with three well known machine learning methods namely decision trees, naive Bayes and support vector machines on two biomarker detection data sets. The results show that the proposed GP method can effectively select a small number of important features from thousands of original features for these problems, the constructed high-level features can further improve the classification performance, and the GP method outperforms the three existing methods, namely naive Bayes, SVMs and J48, on these problems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al-Sahaf, H., Neshatian, K., Zhang, M.: Automatic feature extraction and image classification using genetic programming. In: ICARA, pp. 157–162 (2011)

    Google Scholar 

  2. Bhowan, U., Johnston, M., Zhang, M.: Developing New Fitness Functions in Genetic Programming for Classification With Unbalanced Data, pp. 406–421 (2012)

    Google Scholar 

  3. Bhowan, U., Zhang, M., Johnston, M.: Genetic Programming for Classification with Unbalanced Data. In: Esparcia-Alcázar, A.I., Ekárt, A., Silva, S., Dignum, S., Uyar, A.Ş. (eds.) EuroGP 2010. LNCS, vol. 6021, pp. 1–13. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Boggess, B.: Mass Spectrometry Desk Reference (Sparkman, O. David). Journal of Chemical Education 78(2), 168 (2001)

    Article  Google Scholar 

  5. Cai, J., Smith, D., Xia, X., Yuen, K.-y.: MBEToolbox: a Matlab toolbox for sequence data analysis in molecular biology and evolution. BMC Bioinformatics 6(1), 64 (2005)

    Article  Google Scholar 

  6. Cruz-Marcelo, A., Guerra, R., Vannucci, M., Li, Y., Lau, C.C., Man, T.-K.: Comparison of algorithms for pre-processing of SELDI-TOF mass spectrometry data, pp. 2129–2136 (2008)

    Google Scholar 

  7. Davis, R.A., Charlton, A.J., Oehlschlager, S., Wilson, J.C.: Novel feature selection method for genetic programming using metabolomic 1H NMR data. Chemometrics and Intelligent Laboratory Systems 81(1), 50–59 (2006)

    Article  Google Scholar 

  8. Ge, G., Wong, G.W.: Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles. BMC Bioinformatics 9(1), 275 (2008)

    Article  Google Scholar 

  9. Guo, H., Zhang, Q., Nandi, A.K.: Feature extraction and dimensionality reduction by genetic programming based on the Fisher criterion. Expert Systems 25(5), 444–459 (2008)

    Article  Google Scholar 

  10. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explorations 11(1), 10–18 (2009)

    Article  Google Scholar 

  11. Langdon, W.B., Poli, R., McPhee, N.F., Koza, J.R.: Genetic programming: An introduction and tutorial, with a survey of techniques and applications. In: Computational Intelligence: A Compendium, pp. 927–1028 (2008)

    Google Scholar 

  12. Li, L., Tang, H., Wu, Z., Gong, J., Gruidl, M., Zou, J., Tockman, M., Clark, R.A.: Data mining techniques for cancer detection using serum proteomic profiling. Artificial Intelligence in Medicine 32(2), 71–83 (2004)

    Article  Google Scholar 

  13. Lin, Q., Peng, Q., Yao, F., Pan, X.-F., Xiong, L.-W., Wang, Y., Geng, J.-F., Feng, J.-X., Han, B.-H., Bao, G.-L., Yang, Y., Wang, X., Jin, L., Guo, W., Wang, J.-C.: A classification method based on principal components of seldi spectra to diagnose of lung adenocarcinoma. PLoS ONE 7(3), e34457 (2012)

    Article  Google Scholar 

  14. Listgarten, J., Emili, A.: Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry. Mol. Cell. Proteomics 4, 419–434 (2005)

    Article  Google Scholar 

  15. Neshatian, K., Zhang, M., Andreae, P.: Genetic Programming for Feature Ranking in Classification Problems. In: Li, X., Kirley, M., Zhang, M., Green, D., Ciesielski, V., Abbass, H.A., Michalewicz, Z., Hendtlass, T., Deb, K., Tan, K.C., Branke, J., Shi, Y. (eds.) SEAL 2008. LNCS, vol. 5361, pp. 544–554. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Satten, G.A., Datta, S., Moura, H., Woolfitt, A.R., de Carvalho, M.G., Carlone, G.M., De, B.K., Pavlopoulos, A., Barr, J.R.: Standardization and denoising algorithms for mass spectra to classify whole-organism bacterial specimens. Bioinformatics 20(17), 3128–3136 (2004)

    Article  Google Scholar 

  17. Tuli, L., Tsai, T.-H., Varghese, R., Xiao, J.F., Cheema, A., Ressom, H.: Using a spike-in experiment to evaluate analysis of LC-MS data. Proteome Science 13+ (February 2012)

    Google Scholar 

  18. Wagner, M., Naik, D., Pothen, A.: Protocols for disease classification from mass spectrometry data. Proteomics 3(9), 1692–1698 (2003)

    Article  Google Scholar 

  19. Wedge, D.C., Gaskell, S.J., Hubbard, S.J., Kell, D.B., Lau, K.W., Eyers, C.: Peptide detectability following esi mass spectrometry: prediction using genetic programming. In: Lipson, H. (ed.) GECCO, pp. 2219–2225. ACM (2007)

    Google Scholar 

  20. White, D.R.: Software review: the ECJ toolkit, pp. 65–67 (2012)

    Google Scholar 

  21. Wu, B., Abbott, T., Fishman, D., McMurray, W., Mor, G., Stone, K., Ward, D., Williams, K., Zhao, H.: Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19(13), 1636–1643 (2003)

    Article  Google Scholar 

  22. Zhu, L., Han, B., Li, L., Xu, S., Mou, H.: Null Space LDA Based Feature Extraction of Mass Spectrometry Data for Cancer Classification. In: BMEI, pp. 1–4 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ahmed, S., Zhang, M., Peng, L. (2012). Genetic Programming for Biomarker Detection in Mass Spectrometry Data. In: Thielscher, M., Zhang, D. (eds) AI 2012: Advances in Artificial Intelligence. AI 2012. Lecture Notes in Computer Science(), vol 7691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35101-3_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35101-3_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35100-6

  • Online ISBN: 978-3-642-35101-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics