Abstract
Although proven powerful in making predictions and finding patterns, machine learning algorithms often struggle to provide explanations and translational knowledge when applied to many problems, especially in biomedical sciences. This is often resulted by the highly complex structure employed by machine learning algorithms to represent and model the relationship of the predictors and the response. The prediction accuracy is increased at the cost of having a “black-box” model that is not amenable for interpretation. Genetic programming may provide a potential solution to explainable machine learning for bioinformatics where learned knowledge and patterns can be translated to clinical actions. In this study, we employed an LGP algorithm for a bioinformatics classification problem. We developed feature selection analysis methods and aimed at explaining which features are influential in the prediction, and whether such an influence is through individual effects or synergistic effects of combining with other features.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Almasi, S.M., Hu, T.: Measuring the importance of vertices in the weighted human disease network. PLoS ONE 14(3), e0205,936 (2019)
Altman, R., Alarcon, G., Appelrouth, D., Bloch, D., Borenstein, D., Brandt, K., Brown, C., Cooke, T.D., et al.: The american college of rheumatology criteria for the classification and reporting of osteoarthritis of the hip. Arthritis and Rheumatology 34(5), 505–514 (1991)
Barabasi, A.L., Oltvai, Z.N.: Network biology: Understanding the cell’s functional organization. Nature Reviews Genetics 5, 101–113 (2004)
Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: A fresh approach to numerical computing. CoRR abs/1411.1607 (2014). URL http://arxiv.org/abs/1411.1607
Brameier, M.F., Banzhaf, W.: Linear Genetic Programming. Springer (2007)
Camacho, D.M., Collins, K.M., Powers, R.K., Costello, J.C., Collins, J.J.: Next-generation machine learning for biological networks. Cell 173(7), 1581–1592 (2018)
Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., Elhadad, N.: Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1721–1730 (2015)
Cho, D.Y., Kim, Y.A., Przytycka, T.M.: Network biology approach to complex diseases. PLoS Computational Biology 8(12), e1002,820 (2012)
Dorani, F., Hu, T., Woods, M.O., Zhai, G.: Ensemble learning for detecting gene-gene interactions in colorectal cancer. PeerJ 6, e5854 (2018)
Fontaine-Bisson, B., Thorburn, J., Gregory, A., Zhang, H., Sun, G.: Melanin-concentrating hormone receptor 1 polymorphisms are associated with components of energy balance in the complex diseases in the newfoundland population: Environment and genetics (coding) study. The American Journal of Clinical Nutrition 99(2), 384–391 (2014)
Ghahramani, Z.: Probabilistic machine learning and artificial intelligence. Nature 521, 452–459 (2015)
Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: Proceedings of the 5th IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 80–89 (2018)
Hu, T., Chen, Y., Kiralis, J.W., Moore, J.H.: ViSEN: Methodology and software for visualization of statistical epistasis networks. Genetic Epidemiology 37, 283–285 (2013)
Hu, T., Moore, J.H.: Network modeling of statistical epistasis. In: M. Elloumi, A.Y. Zomaya (eds.) Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, chap. 8, pp. 175–190. Wiley (2013)
Hu, T., Oksanen, K., Zhang, W., Randell, E., Furey, A., Sun, G., Zhai, G.: An evolutioanry learning and network approach to identifying key metabolites for osteoarthritis. PLoS Computational Biology 14(3), e1005,986 (2018)
Hu, T., Sinnott-Armstrong, N.A., Kiralis, J.W., Andrew, A.S., Karagas, M.R., Moore, J.H.: Characterizing genetic interactions in human disease association studies using statistical epistasis networks. BMC Bioinformatics 12, 364 (2011)
Hu, T., Zhang, W., Fan, Z., Sun, G., Likhodi, S., Randell, E., Zhai, G.: Metabolomics differential correlation network analysis of osteoarthritis. Pacific Symposium on Biocomputing 21, 120–131 (2016)
Kafaie, S., Chen, Y., Hu, T.: A network approach to prioritizing susceptibility genes for genome-wide association studies. Genetic Epidemiology 43(5), 477–491 (2019)
Kontny, E., Wojtecka-ŁUkasik, E., Rell-Bakalarska, K., Dziewczopolski, W., Maśliński, W., Maślinski, S.: Impaired generation of taurine chloramine by synovial fluid neutrophils of rheumatoid arthritis patients. Amino Acids 23(4), 415–418 (2002)
Lee, M., Hu, T.: Computational methods for the discovery of metabolic markers of complex traits. Metabolites 9(4), 66 (2019)
Loeser, R.F., Carlson, C.S., Carlo, M.D., Cole, A.: Detection of nitrotyrosine in aging and osteoarthritic cartilage: Correlation of oxidative damage with the presence of interleukin-1β and with chondrocyte resistance to insulin-like growth factor 1. Arthritis and Rheumatology 46(9), 2349–2357 (2002)
Ma, J., Yu, M.K., Fong, S., Ono, K., Sage, E., Demchak, B., Sharan, R., Ideker, T.: Using deep learning to model the hierarchical structure and function of a cell. Nature Methods 15(4), 290–298 (2018)
Marcinkiewicz, J., Kontny, E.: Taurine and inflammatory diseases. Amino Acids 46(1), 7–20 (2014)
Ribeiro, M.T., Singh, S., Guestrin, C.: “why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research 13, 2498–2504 (2003)
Yu, M.K., Ma, J., Fisher, J., Kreisberg, J.F., Raphael, B.J., Ideker, T.: Visible machine learning for biomedicine. Cell 173(7), 1562–1565 (2018)
Zhai, G., Aref-Eshghi, E., Rahman, P., Zhang, H., Martin, G., Furey, A., Green, R.C., Sun, G.: Attempt to replicate the published osteoarthritis-associated genetic variants in the newfoundland & labrador population. Journal of Orthopedics and Rheumatology 1(3), 5 (2014)
Zhai, G., Wang-Sattler, R., Hart, D.J., Arden, N.K., Hakim, A.J., Illig, T., Spector, T.D.: Serum branched-chain amino acid to histidine ratio: a novel metabolomic biomarker of knee osteoarthritis. Annals of the Rheumatic Diseases p. 120857 (2010)
Zhang, W., Likhodii, S., Aref-Eshghi, E., Zhang, Y., Harper, P.E., Randell, E., Green, R., Martin, G., Furey, A., Sun, G., Rahman, P., Zhai, G.: Relationship between blood plasma and synovial fluid metabolite concentrations in patients with osteoarthritis. The Journal of Rheumatology 42(5), 859–865 (2015)
Zhang, W., Sun, G., Likhodii, S., Liu, M., Aref-Eshghi, E., Harper, P.E., Martin, G., Furey, A., Green, R., Randell, E., Rahman, P., Zhai, G.: Metabolomic analysis of human plasma reveals that arginine is depleted in knee osteoarthritis patients. Osteoarthritis and Cartilage 24, 827–834 (2016)
Acknowledgements
This research is supported by the Canadian Natural Sciences and Engineering Research Council (NSERC) Discovery grant RGPIN-04699-2016 to Ting Hu.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Hu, T. (2020). Can Genetic Programming Perform Explainable Machine Learning for Bioinformatics?. In: Banzhaf, W., Goodman, E., Sheneman, L., Trujillo, L., Worzel, B. (eds) Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-39958-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-39958-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39957-3
Online ISBN: 978-3-030-39958-0
eBook Packages: Computer ScienceComputer Science (R0)