Skip to main content

Analyzing Feature Importance for Metabolomics Using Genetic Programming

  • Conference paper
  • First Online:
Genetic Programming (EuroGP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10781))

Included in the following conference series:

Abstract

The emerging and fast-developing field of metabolomics examines the abundance of small-molecule metabolites in body fluids to study the cellular processes related to how the human body responds to genetic and environmental perturbations. Considering the complexity of metabolism, metabolites and their represented cellular processes can correlate and synergistically contribute to a phenotypic status. Genetic programming (GP) provides advanced analytical instruments for the investigation of multifactorial causes of metabolic diseases. In this article, we analyzed a population-based metabolomics dataset on osteoarthritis (OA) and developed a Linear GP (LGP) algorithm to search classification models that can best predict the disease outcome, as well as to identify the most important metabolic markers associated with the disease. The LGP algorithm was able to evolve prediction models with high accuracies especially with a more focused search using a reduced feature set that only includes potentially relevant metabolites. We also identified a set of key metabolic markers that may improve our understanding of the biochemistry and pathogenesis of the disease.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kitano, H.: Systems biology: a brief overview. Science 295(5560), 1662–1664 (2002)

    Article  Google Scholar 

  2. Kitano, H.: Computational systems biology. Nature 420(6912), 206–210 (2002)

    Article  Google Scholar 

  3. Ideker, T., Galitski, T., Hood, L.: A new approach to decoding life: systems biology. Annu. Rev. Genom. Hum. Genet. 2(1), 343–372 (2001)

    Article  Google Scholar 

  4. Cusick, M.E., Klitgord, N., Vidal, M., Hill, D.E.: Interactome: gateway into systems biology. Hum. Mol. Genet. 14(suppl 2), R171–181 (2005)

    Article  Google Scholar 

  5. Bruggeman, F.J., Westerhoff, H.V.: The nature of systems biology. Trends Microbiol. 15(1), 45–50 (2007)

    Article  Google Scholar 

  6. Shim, S.H.: Cell imaging: an intracellular dance visualized. Nature 546, 39–40 (2017)

    Article  Google Scholar 

  7. Wang, K., Lee, I., Carlson, G., Hood, L., Galas, D.: Systems biology and the discovery of diagnostic biomarkers. Dis. Markers 28(4), 199–207 (2010)

    Article  Google Scholar 

  8. Butcher, E.C., Berg, E.L., Kunkel, E.J.: Systems biology in drug discovery. Nat. Biotechnol. 22(10), 1253–1259 (2004)

    Article  Google Scholar 

  9. Li, Y., Chen, L.: Big biological data: challenges and opportunities. Genom. Proteomics Bioinf. 12(5), 187–189 (2014)

    Article  Google Scholar 

  10. Alfieri, R., Milanesi, L.: Multi-level data integration and data mining in systems biology. In: Handbook of Research on Systems Biology Applications in Medicine, pp. 476–496. IGI Global (2009)

    Google Scholar 

  11. Sugimoto, M., Kawakami, M., Robert, M., Soga, T., Tomita, M.: Bioinformatics tools for mass spectroscopy-based metabolomic data processing and analysis. Curr. Bioinf. 7(1), 96–108 (2012)

    Article  Google Scholar 

  12. Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417 (1933)

    Article  MATH  Google Scholar 

  13. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)

    MATH  Google Scholar 

  14. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  15. Worzel, W.P., Yu, J., Almal, A.A., Chinnaiyan, A.M.: Applications of genetic programming in cancer research. Int. J. Biochem. Cell Biol. 41(2), 405–413 (2009)

    Article  Google Scholar 

  16. Kandpal, M., Kalyan, C.M., Samavedham, L.: Genetic programming-based approach to elucidate biochemical interaction networks from data. IET Syst. Biol. 7(1), 18–25 (2013)

    Article  Google Scholar 

  17. Gowda, G.N., Zhang, S., Gu, H., Asiago, V., Shanaiah, N., Raftery, D.: Metabolomics-based methods for early disease diagnostics. Expert Rev. Mol. Diagn. 8(5), 617–633 (2008)

    Article  Google Scholar 

  18. WHO Scientic Group: the burden of musculoskeletal conditions at the start of the new millennium. WHO Technical Report Series 919, 218 (2003)

    Google Scholar 

  19. Reginster, J.Y.: The prevalence and burden of arthritis. Rheumatology 41, 3–6 (2004)

    Article  Google Scholar 

  20. Zhai, G., Aref-Eshghi, E., Rahman, P., Zhang, H., Martin, G., Furey, A., Green, R.C., Sun, G.: Attempt to replicate the published osteoarthritis-associated genetic variants in the newfoundland & labrador population. J. Orthop. Rheumatol. 1(3), 5 (2014)

    Google Scholar 

  21. Hu, T., Zhang, W., Fan, Z., Sun, G., Likhodi, S., Randell, E., Zhai, G.: Metabolomics differential correlation network analysis of osteoarthritis. Pac. Symp. Biocomput. 21, 120–131 (2016)

    Google Scholar 

  22. Altman, R., Alarcon, G., Appelrouth, D., Bloch, D., Borenstein, D., Brandt, K., Brown, C., Cooke, T.D., et al.: The american college of rheumatology criteria for the classification and reporting of osteoarthritis of the hip. Arthritis Rheum. 34(5), 505–514 (1991)

    Article  Google Scholar 

  23. Zhang, W., Likhodii, S., Aref-Eshghi, E., Zhang, Y., Harper, P.E., Randell, E., Green, R., Martin, G., Furey, A., Sun, G., Rahman, P., Zhai, G.: Relationship between blood plasma and synovial fluid metabolite concentrations in patients with osteoarthritis. J. Rheumatol. 42(5), 859–865 (2015)

    Article  Google Scholar 

  24. Brameier, M.F., Banzhaf, W.: Linear Genetic Programming. Springer, New York (2007)

    MATH  Google Scholar 

  25. Brameier, M.F., Banzhaf, W.: A comparison of linear genetic programming and neural networks in medical data mining. IEEE Trans. Evol. Comput. 5(1), 17–26 (2001)

    Article  MATH  Google Scholar 

  26. Guven, A.: Linear genetic programming for time-series modeling of daily flow rate. J. Earth Syst. Sci. 118(2), 137–146 (2009)

    Article  Google Scholar 

  27. Song, D., Heywood, M.I., Zincir-Heywood, A.N.: A linear genetic programming approach to intrusion detection. In: Cantú-Paz, E. (ed.) GECCO 2003. LNCS, vol. 2724, pp. 2325–2336. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45110-2_125

    Chapter  Google Scholar 

  28. Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: a fresh approach to numerical computing. CoRR abs/1411.1607 (2014). http://arxiv.org/abs/1411.1607

  29. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  30. Zhang, W., Sun, G., Likhodii, S., Liu, M., Aref-Eshghi, E., Harper, P.E., Martin, G., Furey, A., Green, R., Randell, E., Rahman, P., Zhai, G.: Metabolomic analysis of human plasma reveals that arginine is depleted in knee osteoarthritis patients. Osteoarthr. Cartil. 24, 827–834 (2016)

    Article  Google Scholar 

  31. Zhai, G., Wang-Sattler, R., Hart, D.J., Arden, N.K., Hakim, A.J., Illig, T., Spector, T.D.: Serum branched-chain amino acid to histidine ratio: a novel metabolomic biomarker of knee osteoarthritis. Ann. Rheum. Dis. 69(6), 1227–1231 (2010)

    Article  Google Scholar 

  32. Zhang, W., Sun, G., Likhodii, S., Aref-Eshghi, E., Harper, P.E., Randell, E., Green, R., Martin, G., Furey, A., Rahman, P., Zhai, G.: Metabolomic analysis of human synovial fluid and plasma reveals that phosphatidylcholine metabolism is associated with both osteoarthritis and diabetes mellitus. Metabolomics 12, 24 (2016)

    Article  Google Scholar 

  33. Zhang, W., Sun, G., Aitken, D., Likhodii, S., Liu, M., Martin, G., Furey, A., Randell, E., Rahman, P., Jones, G., Zhai, G.: Lysophosphatidylcholines to phosphatidylcholines ratio predicts advanced knee osteoarthritis. Rheumatology 55(9), 1566–1574 (2016)

    Article  Google Scholar 

  34. Zhang, W., Likhodii, S., Zhang, Y., Aref-Eshghi, E., Harper, P.E., Randell, E., Green, R., Martin, G., Furey, A., Sun, G., Rahman, P., Zhai, G.: Classification of osteoarthritis phenotypes by metabolomics analysis. BMJ Open 4, e006286 (2014)

    Article  Google Scholar 

  35. Marcinkiewicz, J., Kontny, E.: Taurine and inflammatory diseases. Amino Acids 46(1), 7–20 (2014)

    Article  Google Scholar 

  36. Loeser, R.F.: Aging and osteoarthritis: the role of chondrocyte senescence and aging changes in the cartilage matrix. Osteoarthr. Cartil. 17(8), 971–979 (2009)

    Article  Google Scholar 

  37. Kontny, E., Wojtecka-ŁUkasik, E., Rell-Bakalarska, K., Dziewczopolski, W., Maśliński, W., Maślinski, S.: Impaired generation of taurine chloramine by synovial fluid neutrophils of rheumatoid arthritis patients. Amino Acids 23(4), 415–418 (2002)

    Article  Google Scholar 

  38. Loeser, R.F., Carlson, C.S., Carlo, M.D., Cole, A.: Detection of nitrotyrosine in aging and osteoarthritic cartilage: correlation of oxidative damage with the presence of interleukin-1\(\beta \) and with chondrocyte resistance to insulin-like growth factor 1. Arthritis Rheumatol. 46(9), 2349–2357 (2002)

    Article  Google Scholar 

  39. Forrest, C.M., Kennedy, A., Stone, T.W., Stoy, N., Darlington, L.G.: Kynurenine and neopterin levels in patients with rheumatoid arthritis and osteoporosis during drug treatment. In: Allegri, G., Costa, C.V.L., Ragazzi, E., Steinhart, H., Varesio, L. (eds.) Developments in Tryptophan and Serotonin Metabolism. AEMB, vol. 527, pp. 287–295. Springer, Boston (2003). https://doi.org/10.1007/978-1-4615-0135-0_32

    Chapter  Google Scholar 

Download references

Acknowledgments

This research was supported by Newfoundland and Labrador Research and Development Corporation (RDC) Ignite Grant 5404.1942.101 and the Natural Science and Engineering Research Council (NSERC) of Canada Discovery Grant RGPIN-2016-04699 to TH. GZ acknowledges grants from Canadian Institute of Health Research (CIHR), Newfoundland and Labrador Research and Development Corporation (RDC) and Memorial University. We thank all the study participants who made this study possible and all the Operation Room staff at Eastern Health General Hospital and St. Clare’s Hospital who helped for collecting samples.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hu, T., Oksanen, K., Zhang, W., Randell, E., Furey, A., Zhai, G. (2018). Analyzing Feature Importance for Metabolomics Using Genetic Programming. In: Castelli, M., Sekanina, L., Zhang, M., Cagnoni, S., García-Sánchez, P. (eds) Genetic Programming. EuroGP 2018. Lecture Notes in Computer Science(), vol 10781. Springer, Cham. https://doi.org/10.1007/978-3-319-77553-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77553-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77552-4

  • Online ISBN: 978-3-319-77553-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics