skip to main content
10.1145/2739480.2754771acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

A Re-Examination of the Use of Genetic Programming on the Oral Bioavailability Problem

Published:11 July 2015Publication History

ABSTRACT

Difficult benchmark problems are in increasing demand in Genetic Programming (GP). One problem seeing increased usage is the oral bioavailability problem, which is often presented as a challenging problem to both GP and other machine learning methods. However, few properties of the bioavailability data set have been demonstrated, so attributes that make it a challenging problem are largely unknown. This work uncovers important properties of the bioavailability data set, and suggests that the perceived difficulty in this problem can be partially attributed to a lack of pre-processing, including features within the data set that contain no information, and contradictory relationships between the dependent and independent features of the data set. The paper then re-examines the performance of GP on this data set, and contextualises this performance relative to other regression methods. Results suggest that a large component of the observed performance differences on the bioavailability data set can be attributed to variance in the selection of training and testing data. Differences in performance between GP and other methods disappear when multiple training/testing splits are used within experimental work, with performance typically no better than a null modelling approach of reporting the mean of the training data.

References

  1. F. Archetti, S. Lanzeni, E. Messina, and L. Vanneschi. Genetic programming for human oral bioavailability of drugs. In Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, GECCO '06, pages 255--262, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Archetti, S. Lanzeni, E. Messina, and L. Vanneschi. Genetic programming for computational pharmacokinetics in drug discovery and development. Genetic Programming and Evolvable Machines, 8(4):413--432, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. M. A. Azad and C. Ryan. A simple approach to lifetime learning in genetic programming-based symbolic regression. Evolutionary computation, 22(2):287--317, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Breiman. Random forests. Machine learning, 45(1):5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Castelli, S. Silva, and L. Vanneschi. A C+ framework for geometric semantic genetic programming. Genetic Programming and Evolvable Machines, pages 1--9, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Dick. Bloat and generalisation in symbolic regression. In G. Dick, W. Browne, P. Whigham, M. Zhang, L. Bui, H. Ishibuchi, Y. Jin, X. Li, Y. Shi, P. Singh, K. Tan, and K. Tang, editors, Simulated Evolution and Learning, volume 8886 of Lecture Notes in Computer Science, pages 491--502. Springer International Publishing, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Friedman, T. Hastie, and R. Tibshirani. glmnet: Lasso and elastic-net regularized generalized linear models. R package version, 1, 2009.Google ScholarGoogle Scholar
  8. I. Gonçalves, S. Silva, and C. M. Fonseca. On the generalization ability of geometric semantic genetic programming. In P. Machado, M. I. Heywood, J. McDermott, M. Castelli, P. García-Sánchez, P. Burelli, S. Risi, and K. Sim, editors, Genetic Programming, volume 9025 of Lecture Notes in Computer Science, pages 41--52. Springer International Publishing, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. A. Haeri, M. M. Ebadzadeh, and G. Folino. Improving GP generalization: a variance-based layered learning approach. Genetic Programming and Evolvable Machines, 16(1):27--55, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Harper. Spatial co-evolution: quicker, fitter and less bloated. In Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, GECCO '12, pages 759--766. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., second edition, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Liaw and M. Wiener. Classification and regression by randomforest. R News, 2(3):18--22, 2002.Google ScholarGoogle Scholar
  13. C. A. Lipinski, F. Lombardo, B. W. Dominy, and P. J. Feeney. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews, 23(1--3):3--25, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. McDermott, D. R. White, S. Luke, L. Manzoni, M. Castelli, L. Vanneschi, W. Jaskowski, K. Krawiec, R. Harper, K. De Jong, and U.-M. O'Reilly. Genetic programming needs better benchmarks. In Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, GECCO '12, pages 791--798, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Muhammad Atif Azad, D. Medernach, and C. Ryan. Efficient approaches to interleaved sampling of training data for symbolic regression. In The 6th World Congress on Nature and Biologically Inspired Computing (NaBIC), pages 176--183. IEEE, 2014.Google ScholarGoogle Scholar
  16. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Schliep and K. Hechenbichler. kknn: Weighted k-Nearest Neighbors, 2014. R package version 1.2--5.Google ScholarGoogle Scholar
  18. S. Silva. Reassembling operator equalisation: a secret revealed. ACM SIGEVOlution, 5(3):10--22, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Silva and L. Vanneschi. State-of-the-art genetic programming for predicting human oral bioavailability of drugs. In Advances in Bioinformatics, pages 165--173. Springer, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Silva and L. Vanneschi. The importance of being flat-studying the program length distributions of operator equalisation. In Genetic Programming Theory and Practice IX, pages 211--233. Springer, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  21. S. Silva and L. Vanneschi. Bloat free genetic programming: Application to human oral bioavailability prediction. International journal of data mining and bioinformatics, 6(6):585--601, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267--288, 1996.Google ScholarGoogle Scholar
  23. L. Vanneschi. Investigating problem hardness of real life applications. In Genetic Programming Theory and Practice V, pages 107--124. Springer, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  24. L. Vanneschi, M. Castelli, L. Manzoni, and S. Silva. A new implementation of geometric semantic gp and its application to problems in pharmacokinetics. In K. Krawiec, A. Moraglio, T. Hu, A. c. Etaner-Uyar, and B. Hu, editors, Genetic Programming, volume 7831 of Lecture Notes in Computer Science, pages 205--216. Springer Berlin Heidelberg, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Vanneschi and S. Gustafson. Using crossover based similarity measure to improve genetic programming generalization ability. In Proceedings of the 11th Annual conference on Genetic and Evolutionary Computation, GECCO '09, pages 1139--1146. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Re-Examination of the Use of Genetic Programming on the Oral Bioavailability Problem

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          GECCO '15: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation
          July 2015
          1496 pages
          ISBN:9781450334723
          DOI:10.1145/2739480

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 July 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          GECCO '15 Paper Acceptance Rate182of505submissions,36%Overall Acceptance Rate1,669of4,410submissions,38%

          Upcoming Conference

          GECCO '24
          Genetic and Evolutionary Computation Conference
          July 14 - 18, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader