Skip to main content

Advertisement

Log in

Improving GP generalization: a variance-based layered learning approach

  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

This paper introduces a new method that improves the generalization ability of genetic programming (GP) for symbolic regression problems, named variance-based layered learning GP. In this approach, several datasets, called primitive training sets, are derived from the original training data. They are generated from less complex to more complex, for a suitable complexity measure. The last primitive dataset is still less complex than the original training set. The approach decomposes the evolution process into several hierarchical layers. The first layer of the evolution starts using the least complex (smoothest) primitive training set. In the next layers, more complex primitive sets are given to the GP engine. Finally, the original training data is given to the algorithm. We use the variance of the output values of a function as a measure of the functional complexity. This measure is utilized in order to generate smoother training data, and controlling the functional complexity of the solutions to reduce the overfitting. The experiments, conducted on four real-world and three artificial symbolic regression problems, demonstrate that the approach enhances the generalization ability of the GP, and reduces the complexity of the obtained solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. F. Archetti, S. Lanzeni, E. Messina, L. Vanneschi. Genetic programming for human oral bioavailability of drugs, in Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, GECCO ’06, New York, NY, USA (ACM, New York, 2006), pp. 255–262

  2. A. Ashour, L. Alvarez, V. Toropov, Empirical modelling of shear strength of rc deep beams by genetic programming. Comput. Struct. 81(5), 331–338 (2003)

    Article  Google Scholar 

  3. R. Azad, C. Ryan. Variance based selection to improve test set performance in genetic programming, in Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (ACM, New York, 2011), pp. 1315–1322

  4. V. Babovic, M. Keijzer, Genetic programming as a model induction engine. J. Hydroinform. 2(1), 35–60 (2000)

    Google Scholar 

  5. L. Breiman, Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  6. M. Castelli, L. Manzoni, S. Silva, L. Vanneschi. A comparison of the generalization ability of different genetic programming frameworks, in IEEE Congress on Evolutionary Computation (CEC), 2010 (IEEE 2010), pp. 1–8

  7. M. Castelli, L. Manzoni, S. Silva, L. Vanneschi. A quantitative study of learning and generalization in genetic programming, in Genetic Programming (Springer, Berlin, 2011), pp. 25–36

  8. D. Costelloe, C. Ryan. On improving generalisation in genetic programming, in Genetic Programming (Springer, Berlin, 2009), pp. 61–72

  9. C. Gagné, M. Schoenauer, M. Parizeau, M. Tomassini. Genetic programming, validation sets, and parsimony pressure, in Genetic Programming (Springer, Berlin, 2006), pp. 109–120

  10. I. Gonçalves, S. Silva, J. B. Melo, J. M. Carreiras. Random sampling technique for overfitting control in genetic programming, in Genetic Programming (Springer, Berlin, 2012), pp. 218–229

  11. G.J. Gray, D.J. Murray-Smith, Y. Li, K.C. Sharman, T. Weinbrenner, Nonlinear model structure identification using genetic programming. Control Eng. Pract. 6(11), 1341–1352 (1998)

    Article  Google Scholar 

  12. S.M. Gustafson, W.H. Hsu. Layered learning in genetic programming for a cooperative robot soccer problem, in Proceedings of the 4th European Conference on Genetic Programming, EuroGP ’01, London, UK (Springer, Berlin, 2001), pp. 291–301

  13. N. Hien, N. Hoai, B. McKay. A study on genetic programming with layered learning and incremental sampling, in IEEE Congress on Evolutionary Computation (CEC), 2011 (IEEE, 2011), pp. 1179–1185

  14. N.T. Hien, X.H. Nguyen. Learning in stages: a layered learning approach for genetic programming, in RIVF (2012), pp. 1–4

  15. G.S. Hornby. Alps: the age-layered population structure for reducing the problem of premature convergence, in Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, GECCO ’06, New York, NY, USA (ACM, New York, 2006), pp. 815–822

  16. G.S. Hornby. A steady-state version of the age-layered population structure EA, in Genetic Programming Theory and Practice VII (Springer, Berlin, 2010), pp. 87–102

  17. J. Hu, E. Goodman, K. Seo, Z. Fan, R. Rosenberg, The hierarchical fair competition (hfc) framework for sustainable evolutionary algorithms. Evol. Comput. 13(2), 241–277 (2005)

    Article  Google Scholar 

  18. M. Keijzer, V. Babovic. Genetic programming, ensemble methods and the bias/variance tradeoff - introductory investigations, in Proceedings of the European Conference on Genetic Programming, London, UK (Springer, Berlin, 2000), pp. 76–90

  19. M. Keijzer, C. Ryan, G. Murphy, M. Cattolico. Undirected training of run transferable libraries, in Genetic Programming (Springer, Berlin, 2005), pp. 361–370

  20. J.R. Koza, Genetic Programming: Vol. 1, On the Programming of Computers by Means of Natural Selection, vol. 1 (MIT press, Cambridge, 1992)

    Google Scholar 

  21. J. McDermott, D.R. White, S. Luke, L. Manzoni, M. Castelli, L. Vanneschi, W. Jaskowski, K. Krawiec, R. Harper, K. De Jong, et al. Genetic programming needs better benchmarks, in Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference (ACM, New York, 2012), pp. 791–798

  22. T.M. Mitchell, Machine Learning (McGraw Hill series in computer science, McGraw-Hill, New York, 1997)

    MATH  Google Scholar 

  23. F.W. Moore, Improving means and variances of best-of-run programs in genetic programming, in Proceedings of the Ninth Midwest Artificial Intelligence and Cognitive Science Conference (MAICS-98), Russ Engineering Center, Wright State University, Dayton, Ohio, USA, 20–22 Mar, ed. by M.W. Evens (AAAI Press, Menlo Park, 1998), pp. 95–101

    Google Scholar 

  24. R.H. Myers, Classical and modern regression with applications, vol. 2 (Duxbury Press, Belmont, 1990)

    Google Scholar 

  25. N. Nikolaev, H. Iba, Regularization approach to inductive genetic programming. IEEE Trans. Evol. Comput. 5(4), 359–375 (2001)

    Article  Google Scholar 

  26. M. O’Neill, L. Vanneschi, S. Gustafson, W. Banzhaf, Open issues in genetic programming. Genet. Program. Evolvable Mach. 11(3–4), 339–363 (2010)

    Article  Google Scholar 

  27. J. Park, I.W. Sandberg, Approximation and radial-basis-function networks. Neural Comput. 5(2), 305–316 (1993)

    Article  Google Scholar 

  28. T.J. Rivlin, The Chebyshev Polynomials (Wiley, USA, 1974)

    MATH  Google Scholar 

  29. R. Salustowicz, J. Schmidhuber, Probabilistic incremental program evolution. Evol. Comput. 5(2), 123–141 (1997)

    Article  Google Scholar 

  30. S. Silva, S. Dignum, L. Vanneschi, Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genet. Program. Evol. Mach. 13(2), 197–238 (2012)

    Article  Google Scholar 

  31. S. Silva, L. Vanneschi, Operator equalisation, bloat and overfitting: a study on human oral bioavailability prediction, in Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation (ACM, New York, 2009), pp. 1115–1122

  32. S. Silva, L. Vanneschi. State-of-the-art genetic programming for predicting human oral bioavailability of drugs. Adv Bioinform 165–173 (2010)

  33. StatLib. Statlib datasets archive. http://lib.stat.cmu.edu/datasets/. Accessed 03 July 2013

  34. P. Stone, M.M. Veloso, Layered learning, in Proceedings of the 11th European Conference on Machine Learning, ECML ’00, London, UK (Springer, Berlin, 2000), pp. 369–381

  35. J.A. Suykens, J. Vandewalle, Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)

    Article  MathSciNet  Google Scholar 

  36. L. Trujillo, S. Silva, P. Legrand, L. Vanneschi, An empirical study of functional complexity as an indicator of overfitting in genetic programming, in Genetic Programming (2011), pp 262–273

  37. N. Uy, N. Hien, N. Hoai, M. Oneill. Improving the generalisation ability of genetic programming with semantic similarity based crossover, in Genetic Programming. Lecture Notes in Computer Science, vol. 6021. (Springer, Berlin, 2010), pp. 184–195

  38. N.Q. Uy, N.X. Hoai, M. Oneill, R.I. Mckay, E. Galván-lópez, Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evolvable Mach. 12(2), 91–119 (2011)

    Article  Google Scholar 

  39. T. Van Gestel, J.A. Suykens, B. Baesens, S. Viaene, J. Vanthienen, G. Dedene, B. De Moor, J. Vandewalle, Benchmarking least squares support vector machine classifiers. Mach. Learn. 54(1), 5–32 (2004)

    Article  MATH  Google Scholar 

  40. L. Vanneschi, M. Castelli, S. Silva, Measuring bloat, overfitting and functional complexity in genetic programming, in Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (ACM, New York, 2010), pp. 877–884

  41. L. Vanneschi, R. Poli, Genetic programmingintroduction, applications, theory and open issues, in Handbook of Natural Computing (Springer, Berlin, 2012), pp. 709–739

  42. E.J. Vladislavleva, G.F. Smits, D. Den Hertog, Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. Trans. Evol. Comp. 13, 333–349 (2009)

    Article  Google Scholar 

  43. I. C. Yeh. Concrete Compressive Strength Data Set. http://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength. Accessed 03 July-2013

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Mehdi Ebadzadeh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amir Haeri, M., Ebadzadeh, M.M. & Folino, G. Improving GP generalization: a variance-based layered learning approach. Genet Program Evolvable Mach 16, 27–55 (2015). https://doi.org/10.1007/s10710-014-9220-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10710-014-9220-6

Keywords

Navigation