Genetic Programming Representations for Multi-dimensional Feature Learning in Biomedical Classification

La Cava, William; Silva, Sara; Vanneschi, Leonardo; Spector, Lee; Moore, Jason

doi:10.1007/978-3-319-55849-3_11

William La Cava¹⁵,
Sara Silva^16,17,
Leonardo Vanneschi¹⁸,
Lee Spector¹⁹ &
…
Jason Moore¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10199))

Included in the following conference series:

European Conference on the Applications of Evolutionary Computation

1854 Accesses
9 Citations
2 Altmetric

Abstract

We present a new classification method that uses genetic programming (GP) to evolve feature transformations for a deterministic, distanced-based classifier. This method, called M4GP, differs from common approaches to classifier representation in GP in that it does not enforce arbitrary decision boundaries and it allows individuals to produce multiple outputs via a stack-based GP system. In comparison to typical methods of classification, M4GP can be advantageous in its ability to produce readable models. We conduct a comprehensive study of M4GP, first in comparison to other GP classifiers, and then in comparison to six common machine learning classifiers. We conduct full hyper-parameter optimization for all of the methods on a suite of 16 biomedical data sets, ranging in size and difficulty. The results indicate that M4GP outperforms other GP methods for classification. M4GP performs competitively with other machine learning methods in terms of the accuracy of the produced models for most problems. M4GP also exhibits the ability to detect epistatic interactions better than the other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Notes

1.
Source code available from http://github.com/lacava/ellyn.

References

Arnaldo, I., O’Reilly, U.-M., Veeramachaneni, K.: Building predictive models via feature synthesis, pp. 983–990. ACM Press (2015)
Google Scholar
Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 161–168. ACM (2006)
Google Scholar
Choi, W.-J.: Genetic programming-based feature transform and classification for the automatic detection of pulmonary nodules on computed tomography images. Inf. Sci. 212, 57–78 (2012)
Article Google Scholar
dos Santos, J.A., Ferreira, C.D.: A relevance feedback method based on genetic programming for classification of remote sensing images. Inf. Sci. 181(13), 2671–2684 (2011)
Article Google Scholar
Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Appl. Rev. 40(2), 121–144 (2010)
Google Scholar
Fang, Y., Li, J.: A review of tournament selection in genetic programming. In: Cai, Z., Hu, C., Kang, Z., Liu, Y. (eds.) ISICA 2010. LNCS, vol. 6382, pp. 181–192. Springer, Heidelberg (2010). doi:10.1007/978-3-642-16493-4_19
Chapter Google Scholar
Guyon, I.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Helmuth, T., Spector, L., Matheson, J.: Solving uncompromising problems with lexicase selection. IEEE Trans. Evol. Comput. PP(99), 1 (2014)
Google Scholar
Icke, I., Bongard, J.C.: Improving genetic programming based symbolic regression using deterministic machine learning. In: 2013 IEEE Congress on Evolutionary Computation (CEC), pp. 1763–1770. IEEE (2013)
Google Scholar
Ingalalli, V., Silva, S., Castelli, M., Vanneschi, L.: A multi-dimensional genetic programming approach for multi-class classification problems. In: Nicolau, M., Krawiec, K., Heywood, M.I., Castelli, M., García-Sánchez, P., Merelo, J.J., Rivas Santos, V.M., Sim, K. (eds.) EuroGP 2014. LNCS, vol. 8599, pp. 48–60. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44303-3_5
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Kishore, J.K.: Application of genetic programming for multicategory pattern classification. IEEE Trans. Evol. Comput. 4(3), 242–258 (2000)
Article Google Scholar
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
MATH Google Scholar
Cava, L.: Inference of compact nonlinear dynamic models by epigenetic local search. Eng. Appl. Artif. Intell. 55, 292–306 (2016)
Article Google Scholar
La Cava, W., Spector, L., Danai, K.: Epsilon-lexicase selection for regression. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO 2016, pp. 741–748. ACM, New York (2016)
Google Scholar
Li, T.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15), 2429–2437 (2004)
Article Google Scholar
Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Sciences, Irvine (2013)
Google Scholar
Liu, H.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Article Google Scholar
Liu, L.: Evolutionary compact embedding for large-scale image classification. Inf. Sci. 316, 567–581 (2015)
Article Google Scholar
Loveard, T., Ciesielski, V.: Representing classification problems in genetic programming. In: Proceedings of the 2001 Congress on Evolutionary Computation, vol. 2, pp. 1070–1077. IEEE (2001)
Google Scholar
McConaghy, T.: FFX fast, scalable, deterministic symbolic regression technology. In: Riolo, R., Vladislavleva, E., Moore, J.H. (eds.) Genetic Programming Theory and Practice IX, pp. 235–260. Springer, Heidelberg (2011)
Chapter Google Scholar
Melin, P.: A new neural network model based on the LVQ algorithm for multi-class classification of arrhythmias. Inf. Sci. 279, 483–497 (2014)
Article MathSciNet Google Scholar
Moore, J.H.: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56(1–3), 73–82 (2003)
Article Google Scholar
Moore, J.H., Asselbergs, F.W., Williams, S.M.: Bioinformatics challenges for genome-wide association studies. Bioinformatics 26(4), 445–455 (2010)
Article Google Scholar
Moore, J.H., Greene, C.S., Hill, D.P.: Identification of novel genetic models of glaucoma using the emergent genetic programming-based artificial intelligence system. In: Riolo, R., Worzel, W.P., Kotanchek, M. (eds.) Genetic Programming Theory and Practice XII, pp. 17–35. Springer, Heidelberg (2015)
Chapter Google Scholar
Muñoz, L., Silva, S., Trujillo, L.: M3GP Multiclass Classification with GP. In: Genetic Programming, pp. 78–91. Springer, Heidelberg (2015)
Google Scholar
Murphy, K.P.: Machine learning: a probabilistic perspective. a probabilistic perspective. Adaptive computation. MIT Press, Cambridge (2012)
MATH Google Scholar
Nguyen, T.: Hidden Markov models for cancer classification using gene expression profiles. Inf. Sci. 316, 293–307 (2015)
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
MathSciNet MATH Google Scholar
Perkis, T.: Stack-based genetic programming. In: Proceedings of the First IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence, pp. 148–153. IEEE (1994)
Google Scholar
Poli, R.: A field guide to genetic programming. Lulu Press, Raleigh (2008). [S.I.]. http://www.lulu.com
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, Amsterdam (2014)
Google Scholar
Silva, S., Muñoz, L., Trujillo, L., Ingalalli, V., Castelli, M., Vanneschi, L.: Multiclass classificatin through multidimensional clustering. In: Riolo, R., Worzel, W.P., Kotanchek, M., Kordon, A. (eds.) Genetic Programming Theory and Practice XIII, vol. 13. Springer, Ann Arbor (2015)
Google Scholar
Spector, L.: Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In: Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference Companion, pp. 401–408 (2012)
Google Scholar
Tibshirani, R.: Diagnosis of multiple cancer types by Shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99(10), 6567–6572 (2002)
Article Google Scholar
Urbanowicz, R.J., Kiralis, J., Sinnott-Armstrong, N.A., Heberling, T., Fisher, J.M., Moore, J.H.: GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 5(1), 1 (2012)
Article Google Scholar
USGS. U.S. geological survey (USGS) earth resources observation systems (EROS) data center (EDC)
Google Scholar
Vanneschi, L.: Classification of oncologic data with genetic programming. J. Artif. Evol. Appl. 1–13, 1–13 (2009)
Google Scholar
Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the Warren Center for Network and Data Science, as well as NIH grants P30-ES013508, AI116794 and LM009012. S. Silva acknowledges project PERSEIDS (PTDC/EMS-SIS/0642/2014) and BioISI RD unit, UID/MULTI/04046/2013, funded by FCT/MCTES/PIDDAC, Portugal. This material is based upon work supported by the National Science Foundation under Grants Nos. 1617087, 1129139 and 1331283. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
William La Cava & Jason Moore
Faculdade de Ciências, Departamento de Informática, BioISI - Biosystems and Integrative Sciences Institute, Universidade de Lisboa, 1749-016, Lisboa, Portugal
Sara Silva
CISUC, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
Sara Silva
NOVA IMS, Universidade Nova de Lisboa, 1070-312, Lisbon, Portugal
Leonardo Vanneschi
School of Cognitive Science, Hampshire College, Amherst, Massachusetts, USA
Lee Spector

Authors

William La Cava
View author publications
You can also search for this author in PubMed Google Scholar
Sara Silva
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Vanneschi
View author publications
You can also search for this author in PubMed Google Scholar
Lee Spector
View author publications
You can also search for this author in PubMed Google Scholar
Jason Moore
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William La Cava .

Editor information

Editors and Affiliations

Politecnico di Torino, Turin, Italy
Giovanni Squillero
Edinburgh Napier University, Edinburgh, United Kingdom
Kevin Sim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

La Cava, W., Silva, S., Vanneschi, L., Spector, L., Moore, J. (2017). Genetic Programming Representations for Multi-dimensional Feature Learning in Biomedical Classification. In: Squillero, G., Sim, K. (eds) Applications of Evolutionary Computation. EvoApplications 2017. Lecture Notes in Computer Science(), vol 10199. Springer, Cham. https://doi.org/10.1007/978-3-319-55849-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-55849-3_11
Published: 25 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55848-6
Online ISBN: 978-3-319-55849-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics