Skip to main content

Using Operator Equalisation for Prediction of Drug Toxicity with Genetic Programming

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5816))

Abstract

Predicting the toxicity of new potential drugs is a fundamental step in the drug design process. Recent contributions have shown that, even though Genetic Programming is a promising method for this task, the problem of predicting the toxicity of molecular compounds is complex and difficult to solve. In particular, when executed for predicting drug toxicity, Genetic Programming undergoes the well-known phenomenon of bloat, i.e. the growth in code size during the evolutionary process without a corresponding improvement in fitness. We hypothesize that this might cause overfitting and thus prevent the method from discovering simpler and potentially more general solutions. For this reason, in this paper we investigate two recently defined variants of the operator equalization bloat control method for Genetic Programming. We show that these two methods are bloat free also when executed on this complex problem. Nevertheless, overfitting still remains an issue. Thus, contradicting the generalized idea that bloat and overfitting are strongly related, we argue that the two phenomena are independent from each other and that eliminating bloat does not necessarily eliminate overfitting.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Archetti, F., Lanzeni, S., Messina, E., Vanneschi, L.: Genetic programming and other machine learning approaches to predict median oral lethal dose (LD50) and plasma protein binding levels (%PPB) of drugs. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds.) EvoBIO 2007. LNCS, vol. 4447, pp. 11–23. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  2. Archetti, F., Messina, E., Lanzeni, S., Vanneschi, L.: Genetic programming for computational pharmacokinetics in drug discovery and development. Genetic Programming and Evolvable Machines 8(4), 17–26 (2007)

    Article  Google Scholar 

  3. Dignum, S., Poli, R.: Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat. In: Thierens, D., et al. (eds.) GECCO 2007: Proceedings of the 9th annual conference on Genetic and evolutionary computation, vol. 2, pp. 1588–1595. ACM Press, New York (2007)

    Google Scholar 

  4. Dignum, S., Poli, R.: Crossover, sampling, bloat and the harmful effects of size limits. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcázar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 158–169. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Dignum, S., Poli, R.: Operator equalisation and bloat free GP. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcázar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 110–121. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  6. Yoshida, F., Topliss, J.G.: QSAR model for drug human oral bioavailability. Journal of Medicinal Chemistry 43, 2575–2585 (2000)

    Article  Google Scholar 

  7. Fernandez, F., Tomassini, M., Vanneschi, L.: An empirical study of multipopulation genetic programming. Genetic Programming and Evolvable Machines 4(1), 21–51 (2003)

    Article  MATH  Google Scholar 

  8. Colmenarejo, G., Alvarez-Pedraglio, A., Lavandera, J.L.: Chemoinformatic models to predict binding affinities to human serum albumin. Journal of Medicinal Chemistry 44, 4370–4378 (2001)

    Article  Google Scholar 

  9. Van de Waterbeemd, H., Rose, S.: In: Wermuth, L.G. (ed.) The Practice of Medicinal Chemistry, 2nd edn., pp. 1367–1385. Academic Press, London (2003)

    Google Scholar 

  10. Kola, I., Landis, J.: Can the pharmaceutical industry reduce attrition rates? Nature Reviews Dug Discovery 3, 711–716 (2004)

    Article  Google Scholar 

  11. Igel, C., Chellapilla, K.: Investigating the influence of depth and degree of genotypic change on fitness in genetic programming. In: Banzhaf, W., et al. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference, vol. 2, pp. 1061–1068. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  12. Feng, J., Lurati, L., Ouyang, H., Robinson, T., Wang, Y., Yuan, S., Young, S.S.: Predictive toxicology: benchmarking molecular descriptors and statistical methods. Journal of Chemical Information Computer Science 43, 1463–1470 (2003)

    Article  Google Scholar 

  13. Luke, S.: Modification point depth and genome growth in genetic programming. Evolutionary Computation 11(1), 67–106 (2003)

    Article  Google Scholar 

  14. Luke, S., Panait, L.: Lexicographic parsimony pressure. In: Langdon, W.B., et al. (eds.) GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 829–836. Morgan Kaufmann Publishers, San Francisco (2002)

    Google Scholar 

  15. Poli, R., Langdon, W.B., Dignum, S.: On the limiting distribution of program sizes in tree-based genetic programming. In: Ebner, M., et al. (eds.) Proceedings of the 10th European Conference on Genetic Programming, Valencia, Spain, April 11 - 13, pp. 193–204. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  16. Poli, R., McPhee, N.F., Vanneschi, L.: The impact of population size on code growth in GP: analysis and empirical validation. In: Keijzer, M., et al. (eds.) GECCO 2008: Proceedings of the 10th annual conference on Genetic and evolutionary computation, pp. 1275–1282. ACM, New York (2008)

    Google Scholar 

  17. Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming (2008) (Published), http://lulu.com , http://www.gp-field-guide.org.uk , (With contributions by J. R. Koza)

  18. Todeschini, R., Consonni, V.: Handbook of Molecular Descriptors. Wiley-VCH, Weinheim (2000)

    Book  Google Scholar 

  19. Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)

    Article  MATH  Google Scholar 

  20. David, S., Wishart, Knox, C., Guo, A.C., Shrivastava, S., Hassanali, M., Stothard, P., Chang, Z., Woolsey, J.: DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research, 34 (2006), doi:10.1093/nar/gkj067

    Google Scholar 

  21. Silva, S.: GPLAB – a genetic programming toolbox for MATLAB, version 3.0 (2009) http://gplab.sourceforge.net

  22. Silva, S., Almeida, J.: Dynamic maximum tree depth. In: Cantú-Paz, E., et al. (eds.) Genetic and Evolutionary Computation – GECCO-2003, pp. 1776–1787. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  23. Silva, S., Costa, E.: Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genetic Programming and Evolvable Machines 10(2), 141–179 (2009) (Published Online January 13, 2009)

    Article  Google Scholar 

  24. Silva, S., Dignum, S.: Extending operator equalisation: Fitness based self adaptive length distribution for bloat free GP. In: Vanneschi, L., et al. (eds.) Proceedings of the 12th European Conference on Genetic Programming, EuroGP2009, pp. 159–170. Springer, Heidelberg (2009)

    Google Scholar 

  25. Silva, S., Vanneschi, L.: Operator Equalisation, Bloat and Overfitting - A Study on Human Oral Bioavailability Prediction. In: Rothlauf, F., et al. (eds.) Proceedings of GECCO-2009, ACM Press, New York (to appear, 2009)

    Google Scholar 

  26. Kennedy, T.: Managing the drug discovery/development interface. Drug Discovery Today 2, 436–444 (1997)

    Article  Google Scholar 

  27. Martin, T.M., Young, D.M.: Prediction of the Acute Toxicity (96-h LC50) of Organic Compounds to the Fathead Minnow (Pimephales promelas) Using a Group Contribution Method. Chemical Research in Toxicology 14(10), 1378–1385 (2001)

    Article  Google Scholar 

  28. Tetko, I.V., Gasteiger, J., Todeschini, R., Mauri, A., Livingstone, D., Ertl, P., Palyulin, V.A., Radchenko, E.V., Zefirov, N.S., Makarenko, A.S., Tanchuk, V.Y., Prokopenko, V.V.: Virtual computational chemistry laboratory - design and description. Journal of Computer Aided Molecular Design 19, 453–463 (2005)

    Article  Google Scholar 

  29. Vanneschi, L., Tomassini, M., Collard, P., Clergue, M.: Fitness distance correlation in structural mutation genetic programming. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 455–464. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  30. Langdon, W.B., Barrett, S.J.: Genetic Programming in data mining for drug discovery. In: Evolutionary computing in data mining, pp. 211–235 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vanneschi, L., Silva, S. (2009). Using Operator Equalisation for Prediction of Drug Toxicity with Genetic Programming. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds) Progress in Artificial Intelligence. EPIA 2009. Lecture Notes in Computer Science(), vol 5816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04686-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04686-5_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04685-8

  • Online ISBN: 978-3-642-04686-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics