Skip to main content

Validation Sets, Genetic Programming and Generalisation

  • Conference paper
  • First Online:
Book cover Research and Development in Intelligent Systems XXVIII (SGAI 2011)

Abstract

This paper investigates a new application of a validation set when using a three data set methodology with Genetic Programming (GP). Our system uses Validation Pressure combined with Validation Elitism to influence fitness evaluation and population structure with the aim of improving the system’s ability to evolve individuals with an enhanced capacity for generalisation. This strategy facilitates the use of a validation set to reduce over-fitting while mitigating the loss of training data associated with traditional methods employing a validation set.

The method is tested on five benchmark binary classification data sets and results obtained suggest that the strategy can deliver improved generalisation on unseen test data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. M. A. Azad and C. Ryan. Abstract functions and lifetime learning in genetic programming for symbolic regression. In J. Branke, M. Pelikan, E. Alba, D. V. Arnold, J. Bongard, A. Brabazon, J. Branke, M. V. Butz, J. Clune, M. Cohen, K. Deb, A. P. Engelbrecht, N. Krasnogor, J. F. Miller, M. O’Neill, K. Sastry, D. Thierens, J. van Hemert, L. Vanneschi, and C.Witt, editors, GECCO ’10: Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 893–900, Portland, Oregon, USA, 7-11 July 2010. ACM.

    Google Scholar 

  2. K. Badran and P. I. Rockett. The influence of mutation on population dynamics in multiobjective genetic programming. Genetic Programming and Evolvable Machines, 11(1):5–33, Mar. 2010.

    Article  Google Scholar 

  3. B. Baesens, M. Egmont-Petersen, R. Castelo, and J. Vanthienen. Learning bayesian network classifiers for credit scoring using markov chain monte carlo search. In Proceedings of the 16 th International Conference on Pattern Recognition (ICPR’02) Volume 3 - Volume 3, ICPR ’02, pages 30049–, Washington, DC, USA, 2002. IEEE Computer Society.

    Google Scholar 

  4. B. Cetisli. Development of an adaptive neuro-fuzzy classifier using linguistic hedges: Part 1. Expert Syst. Appl., 37:6093–6101, August 2010.

    Article  Google Scholar 

  5. D. Costelloe and C. Ryan. On improving generalisation in genetic programming. In L. Vanneschi, S. Gustafson, A. Moraglio, I. De Falco, and M. Ebner, editors, Proceedings of the 12th European Conference on Genetic Programming, EuroGP 2009, volume 5481 of LNCS, pages 61–72, Tuebingen, Apr. 15-17 2009. Springer.

    Google Scholar 

  6. M. Darwiche, M. Feuilloy, G. Bousaleh, and D. Schang. Prediction of blood transfusion donation. In Research Challenges in Information Science (RCIS), 2010 Fourth International Conference on, pages 51 –56, may 2010.

    Google Scholar 

  7. J. Eggermont, J. N. Kok, and W. A. Kosters. Genetic programming for data classification: Partitioning the search space. In Proceedings of the 2004 Symposium on Applied Computing (ACM SAC’04), pages 1001–1005, Nicosia, Cyprus, 14-17 Mar. 2004.

    Google Scholar 

  8. J. Fitzgerald and C. Ryan. Drawing boundaries: using individual evolved class boundaries for binary classification problems. In N. Krasnogor and P. L. Lanzi, editors, GECCO, pages 1347–1354. ACM, 2011.

    Google Scholar 

  9. N. Foreman and M. Evett. Preventing overfitting in GP with canary functions. In H.-G. Beyer, U.-M. O’Reilly, D. V. Arnold, W. Banzhaf, C. Blum, E. W. Bonabeau, E. Cantu-Paz, D. Dasgupta, K. Deb, J. A. Foster, E. D. de Jong, H. Lipson, X. Llora, S. Mancoridis, M. Pelikan, G. R. Raidl, T. Soule, A. M. Tyrrell, J.-P. Watson, and E. Zitzler, editors, GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, volume 2, pages 1779–1780, Washington DC, USA, 25-29 June 2005. ACM Press.

    Google Scholar 

  10. A. Frank and A. Asuncion. UCI machine learning repository, 2010.

    Google Scholar 

  11. C. Gagné and M. Parizeau. Open beagle: A new c++ evolutionary computation framework. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’02, pages 888–, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc.

    Google Scholar 

  12. C. Gagné, M. Schoenauer, M. Parizeau, and M. Tomassini. Genetic programming, validation sets, and parsimony pressure. In P. Collet, M. Tomassini, M. Ebner, S. Gustafson, and A. Ekárt, editors, Proceedings of the 9th European Conference on Genetic Programming, volume 3905 of Lecture Notes in Computer Science, pages 109–120, Budapest, Hungary, 10 - 12 Apr. 2006. Springer.

    Google Scholar 

  13. H. Jabeen and A. Baig. A Framework for Optimization of Genetic Programming Evolved Classifier Expressions Using Particle Swarm Optimization. In a. n. u. e. l. GraÃa, Romay, E. Corchado, and Garcia, Sebastian, editors, Hybrid Artificial Intelligence Systems, volume 6076 of Lecture Notes in Computer Science, chapter 7, pages 56–63–63. Springer Berlin / Heidelberg, Berlin, Heidelberg, 2010.

    Google Scholar 

  14. M. Johnston, T. Liddle, and M. Zhang. A linear regression approach to numerical simplification in tree-based genetic programming. Research report 09-7, School of Mathematics Statistics and Operations Research, Victoria University of Wellington, New Zealand, 14 Dec. 2009.

    Google Scholar 

  15. J. R. Koza. Genetic programming: A paradigm for genetically breeding populations of computer programs to solve problems. Technical report, 1990.

    Google Scholar 

  16. I. Kushchu. Genetic programming and evolutionary generalization. IEEE Transactions on Evolutionary Computation, 6(5):431–442, Oct. 2002.

    Article  Google Scholar 

  17. T.-S. Lim, W.-Y. LOH, and W. Cohen. A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms, 2000.

    Google Scholar 

  18. Y. Liu and T. M. Khoshgoftaar. Genetic programming model for software quality classification. In Sixth IEEE International Symposium on High Assurance Systems Engineering, HASE’01, pages 127–136, Boco Raton, FL, USA, Oct. 22-24 2001. IEEE.

    Google Scholar 

  19. T. Loveard and V. Ciesielski. Representing classification problems in genetic programming. In Proceedings of the Congress on Evolutionary Computation, volume 2, pages 1070–1077, COEX, World Trade Center, 159 Samseong-dong, Gangnam-gu, Seoul, Korea, 27-30 May 2001. IEEE Press.

    Google Scholar 

  20. S. Luke and L. Panait. Lexicographic parsimony pressure. In W. B. Langdon, E. Cantú-Paz, K. Mathias, R. Roy, D. Davis, R. Poli, K. Balakrishnan, V. Honavar, G. Rudolph, J. Wegener, L. Bull, M. A. Potter, A. C. Schultz, J. F. Miller, E. Burke, and N. Jonoska, editors, GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pages 829–836, New York, 9-13 July 2002. Morgan Kaufmann Publishers.

    Google Scholar 

  21. D. Meyer. The support vector machine under test. Neurocomputing, 55(1-2):169–186, Sept. 2003.

    Article  Google Scholar 

  22. J. F. Miller and P. Thomson. Aspects of digital evolution: Geometry and learning. In Proceedings of the Second International Conference on Evolvable Systems, pages 25–35. Springer- Verlag, 1998.

    Google Scholar 

  23. D. P. Muni, N. R. Pal, and J. Das. A novel approach to design classifier using genetic programming. IEEE Transactions on Evolutionary Computation, 8(2):183–196, Apr. 2004.

    Article  Google Scholar 

  24. D. Parrott, X. Li, and V. Ciesielski. Multi-objective techniques in genetic programming for evolving classifiers. In D. Corne, Z. Michalewicz, M. Dorigo, G. Eiben, D. Fogel, C. Fonseca, G. Greenwood, T. K. Chen, G. Raidl, A. Zalzala, S. Lucas, B. Paechter, J. Willies, J. J. M. Guervos, E. Eberbach, B. McKay, A. Channon, A. Tiwari, L. G. Volkert, D. Ashlock, and M. Schoenauer, editors, Proceedings of the 2005 IEEE Congress on Evolutionary Computation, volume 2, pages 1141–1148, Edinburgh, UK, 2-5 Sept. 2005. IEEE Press.

    Google Scholar 

  25. K. Polat and S. Günes¸. Artificial immune recognition system with fuzzy resource allocation mechanism classifier, principal component analysis and fft method based new hybrid automated identification system for classification of eeg signals. Expert Syst. Appl., 34:2039–2048, April 2008.

    Article  Google Scholar 

  26. R. Poli, N. F. McPhee, and L. Vanneschi. Elitism reduces bloat in genetic programming. In M. Keijzer, G. Antoniol, C. B. Congdon, K. Deb, B. Doerr, N. Hansen, J. H. Holmes, G. S. Hornby, D. Howard, J. Kennedy, S. Kumar, F. G. Lobo, J. F. Miller, J. Moore, F. Neumann, M. Pelikan, J. Pollack, K. Sastry, K. Stanley, A. Stoica, E.-G. Talbi, and I. Wegener, editors, GECCO ’08: Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages 1343–1344, Atlanta, GA, USA, 12-16 July 2008. ACM.

    Google Scholar 

  27. D. Robilliard and C. Fonlupt. Backwarding : An overfitting control for genetic programming in a remote sensing application. In P. Collet, C. Fonlupt, J.-K. Hao, E. Lutton, and M. Schoenauer, editors, Artificial Evolution 5th International Conference, Evolution Artificielle, EA 2001, volume 2310 of LNCS, pages 245–254, Creusot, France, Oct. 29-31 2001. Springer Verlag.

    Google Scholar 

  28. A. Thammano and J. Moolwong. Classification algorithm based on human social behavior. In Proceedings of the 7th IEEE International Conference on Computer and Information Technology, pages 105–109, Washington, DC, USA, 2007. IEEE Computer Society.

    Google Scholar 

  29. J. D. Thomas and K. Sycara. The importance of simplicity and validation in genetic programming for data mining in financial data. In A. A. Freitas, editor, Data Mining with Evolutionary Algorithms: Research Directions, pages 7–11, Orlando, Florida, 18 July 1999. AAAI Press. Technical Report WS-99-06.

    Google Scholar 

  30. C. Tuite, A. Agapitos, M. O’Neill, and A. Brabazon. A preliminary investigation of overfitting in evolutionary driven model induction: Implications for financial modelling. In C. Di Chio, A. Brabazon, G. Di Caro, R. Drechsler, M. Ebner, M. Farooq, J. Grahl, G. Greenfield, C. Prins, J. Romero, G. Squillero, E. Tarantino, A. G. B. Tettamanzi, N. Urquhart, and A. S. Uyar, editors, Applications of Evolutionary Computing, EvoApplications 2011: EvoCOMNET, EvoFIN, EvoHOT, EvoMUSART, EvoSTIM, EvoTRANSLOG, volume 6625 of LNCS, pages 121–130, Turin, Italy, 27-29 Apr. 2011. Springer Verlag.

    Google Scholar 

  31. L. Vanneschi, M. Castelli, and S. Silva. Measuring bloat, overfitting and functional complexity in genetic programming. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, GECCO ’10, pages 877–884, New York, NY, USA, 2010. ACM.

    Google Scholar 

  32. S. M. Winkler, M. Affenzeller, and S. Wagner. Using enhanced genetic programming techniques for evolving classifiers in the context of medical diagnosis - an empirical study. In S. L. Smith, S. Cagnoni, and J. van Hemert, editors, MedGEC 2006 GECCO Workshop on Medical Applications of Genetic and Evolutionary Computation, Seattle, WA, USA, 8 July 2006.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeannie Fitzgerald .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this paper

Cite this paper

Fitzgerald, J., Ryan, C. (2011). Validation Sets, Genetic Programming and Generalisation. In: Bramer, M., Petridis, M., Nolle, L. (eds) Research and Development in Intelligent Systems XXVIII. SGAI 2011. Springer, London. https://doi.org/10.1007/978-1-4471-2318-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2318-7_6

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-2317-0

  • Online ISBN: 978-1-4471-2318-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics