Skip to main content

Can Genetic Programming Perform Explainable Machine Learning for Bioinformatics?

  • Chapter
  • First Online:

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

Although proven powerful in making predictions and finding patterns, machine learning algorithms often struggle to provide explanations and translational knowledge when applied to many problems, especially in biomedical sciences. This is often resulted by the highly complex structure employed by machine learning algorithms to represent and model the relationship of the predictors and the response. The prediction accuracy is increased at the cost of having a “black-box” model that is not amenable for interpretation. Genetic programming may provide a potential solution to explainable machine learning for bioinformatics where learned knowledge and patterns can be translated to clinical actions. In this study, we employed an LGP algorithm for a bioinformatics classification problem. We developed feature selection analysis methods and aimed at explaining which features are influential in the prediction, and whether such an influence is through individual effects or synergistic effects of combining with other features.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Almasi, S.M., Hu, T.: Measuring the importance of vertices in the weighted human disease network. PLoS ONE 14(3), e0205,936 (2019)

    Article  Google Scholar 

  2. Altman, R., Alarcon, G., Appelrouth, D., Bloch, D., Borenstein, D., Brandt, K., Brown, C., Cooke, T.D., et al.: The american college of rheumatology criteria for the classification and reporting of osteoarthritis of the hip. Arthritis and Rheumatology 34(5), 505–514 (1991)

    Article  Google Scholar 

  3. Barabasi, A.L., Oltvai, Z.N.: Network biology: Understanding the cell’s functional organization. Nature Reviews Genetics 5, 101–113 (2004)

    Article  Google Scholar 

  4. Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: A fresh approach to numerical computing. CoRR abs/1411.1607 (2014). URL http://arxiv.org/abs/1411.1607

  5. Brameier, M.F., Banzhaf, W.: Linear Genetic Programming. Springer (2007)

    Google Scholar 

  6. Camacho, D.M., Collins, K.M., Powers, R.K., Costello, J.C., Collins, J.J.: Next-generation machine learning for biological networks. Cell 173(7), 1581–1592 (2018)

    Article  Google Scholar 

  7. Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., Elhadad, N.: Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1721–1730 (2015)

    Google Scholar 

  8. Cho, D.Y., Kim, Y.A., Przytycka, T.M.: Network biology approach to complex diseases. PLoS Computational Biology 8(12), e1002,820 (2012)

    Article  Google Scholar 

  9. Dorani, F., Hu, T., Woods, M.O., Zhai, G.: Ensemble learning for detecting gene-gene interactions in colorectal cancer. PeerJ 6, e5854 (2018)

    Article  Google Scholar 

  10. Fontaine-Bisson, B., Thorburn, J., Gregory, A., Zhang, H., Sun, G.: Melanin-concentrating hormone receptor 1 polymorphisms are associated with components of energy balance in the complex diseases in the newfoundland population: Environment and genetics (coding) study. The American Journal of Clinical Nutrition 99(2), 384–391 (2014)

    Article  Google Scholar 

  11. Ghahramani, Z.: Probabilistic machine learning and artificial intelligence. Nature 521, 452–459 (2015)

    Article  Google Scholar 

  12. Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: Proceedings of the 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 80–89 (2018)

    Google Scholar 

  13. Hu, T., Chen, Y., Kiralis, J.W., Moore, J.H.: ViSEN: Methodology and software for visualization of statistical epistasis networks. Genetic Epidemiology 37, 283–285 (2013)

    Article  Google Scholar 

  14. Hu, T., Moore, J.H.: Network modeling of statistical epistasis. In: M. Elloumi, A.Y. Zomaya (eds.) Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, chap. 8, pp. 175–190. Wiley (2013)

    Google Scholar 

  15. Hu, T., Oksanen, K., Zhang, W., Randell, E., Furey, A., Sun, G., Zhai, G.: An evolutioanry learning and network approach to identifying key metabolites for osteoarthritis. PLoS Computational Biology 14(3), e1005,986 (2018)

    Article  Google Scholar 

  16. Hu, T., Sinnott-Armstrong, N.A., Kiralis, J.W., Andrew, A.S., Karagas, M.R., Moore, J.H.: Characterizing genetic interactions in human disease association studies using statistical epistasis networks. BMC Bioinformatics 12, 364 (2011)

    Article  Google Scholar 

  17. Hu, T., Zhang, W., Fan, Z., Sun, G., Likhodi, S., Randell, E., Zhai, G.: Metabolomics differential correlation network analysis of osteoarthritis. Pacific Symposium on Biocomputing 21, 120–131 (2016)

    Google Scholar 

  18. Kafaie, S., Chen, Y., Hu, T.: A network approach to prioritizing susceptibility genes for genome-wide association studies. Genetic Epidemiology 43(5), 477–491 (2019)

    Article  Google Scholar 

  19. Kontny, E., Wojtecka-ŁUkasik, E., Rell-Bakalarska, K., Dziewczopolski, W., Maśliński, W., Maślinski, S.: Impaired generation of taurine chloramine by synovial fluid neutrophils of rheumatoid arthritis patients. Amino Acids 23(4), 415–418 (2002)

    Article  Google Scholar 

  20. Lee, M., Hu, T.: Computational methods for the discovery of metabolic markers of complex traits. Metabolites 9(4), 66 (2019)

    Article  Google Scholar 

  21. Loeser, R.F., Carlson, C.S., Carlo, M.D., Cole, A.: Detection of nitrotyrosine in aging and osteoarthritic cartilage: Correlation of oxidative damage with the presence of interleukin-1β and with chondrocyte resistance to insulin-like growth factor 1. Arthritis and Rheumatology 46(9), 2349–2357 (2002)

    Article  Google Scholar 

  22. Ma, J., Yu, M.K., Fong, S., Ono, K., Sage, E., Demchak, B., Sharan, R., Ideker, T.: Using deep learning to model the hierarchical structure and function of a cell. Nature Methods 15(4), 290–298 (2018)

    Article  Google Scholar 

  23. Marcinkiewicz, J., Kontny, E.: Taurine and inflammatory diseases. Amino Acids 46(1), 7–20 (2014)

    Article  Google Scholar 

  24. Ribeiro, M.T., Singh, S., Guestrin, C.: “why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)

    Google Scholar 

  25. Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research 13, 2498–2504 (2003)

    Article  Google Scholar 

  26. Yu, M.K., Ma, J., Fisher, J., Kreisberg, J.F., Raphael, B.J., Ideker, T.: Visible machine learning for biomedicine. Cell 173(7), 1562–1565 (2018)

    Article  Google Scholar 

  27. Zhai, G., Aref-Eshghi, E., Rahman, P., Zhang, H., Martin, G., Furey, A., Green, R.C., Sun, G.: Attempt to replicate the published osteoarthritis-associated genetic variants in the newfoundland & labrador population. Journal of Orthopedics and Rheumatology 1(3), 5 (2014)

    Google Scholar 

  28. Zhai, G., Wang-Sattler, R., Hart, D.J., Arden, N.K., Hakim, A.J., Illig, T., Spector, T.D.: Serum branched-chain amino acid to histidine ratio: a novel metabolomic biomarker of knee osteoarthritis. Annals of the Rheumatic Diseases p. 120857 (2010)

    Google Scholar 

  29. Zhang, W., Likhodii, S., Aref-Eshghi, E., Zhang, Y., Harper, P.E., Randell, E., Green, R., Martin, G., Furey, A., Sun, G., Rahman, P., Zhai, G.: Relationship between blood plasma and synovial fluid metabolite concentrations in patients with osteoarthritis. The Journal of Rheumatology 42(5), 859–865 (2015)

    Article  Google Scholar 

  30. Zhang, W., Sun, G., Likhodii, S., Liu, M., Aref-Eshghi, E., Harper, P.E., Martin, G., Furey, A., Green, R., Randell, E., Rahman, P., Zhai, G.: Metabolomic analysis of human plasma reveals that arginine is depleted in knee osteoarthritis patients. Osteoarthritis and Cartilage 24, 827–834 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the Canadian Natural Sciences and Engineering Research Council (NSERC) Discovery grant RGPIN-04699-2016 to Ting Hu.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hu, T. (2020). Can Genetic Programming Perform Explainable Machine Learning for Bioinformatics?. In: Banzhaf, W., Goodman, E., Sheneman, L., Trujillo, L., Worzel, B. (eds) Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-39958-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-39958-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-39957-3

  • Online ISBN: 978-3-030-39958-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics