Skip to main content

Genetic Programming Symbolic Regression: What Is the Prior on the Prediction?

  • Chapter
  • First Online:

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

In the context of Genetic Programming Symbolic Regression, we empirically investigate the prior on the output prediction, that is, the distribution of the output prior to observing data. We distinguish between the prior due to initialisation and due to evolutionary search. We also investigate the effect on the prior of maximum tree depth and the effect of different function sets and different independent variable distributions. We find that priors are highly diffuse and sometimes include support for extreme values. We compare priors to values for dependent variables observed in benchmarks and real-world problems, finding that mismatches occur and can affect algorithm behaviour and performance. As a further application of our results, we investigate the behaviour of mutation operators in semantic space.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In many GP configurations, there are no true local optima since the mutation operator can jump to anywhere in the search space in single step. We can informally define a local pseudo-optimum as a point where improving steps are not impossible but highly unlikely.

  2. 2.

    We adopt the whisker definition of 3rd-quantile +  1.5 * IQR for the upper whisker, and inversely for the lower whisker.

References

  1. Beadle, L., Johnson, C.G.: Semantic analysis of program initialisation in genetic programming. Genetic Programming and Evolvable Machines 10(3), 307–337 (2009)

    Article  Google Scholar 

  2. Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Transactions on Evolutionary Computation 17(3), 368–386 (2013)

    Article  Google Scholar 

  3. Burbidge, J.B., Magee, L., Robb, A.L.: Alternative transformations to handle extreme values of the dependent variable. Journal of the American Statistical Association 83(401), 123–127 (1988). https://doi.org/10.1080/01621459.1988.10478575. https://amstat.tandfonline.com/doi/abs/10.1080/01621459.1988.10478575

  4. Castelli, M., Silva, S., Vanneschi, L.: A C+ + framework for geometric semantic genetic programming. Genetic Programming and Evolvable Machines 16(1), 73–81 (2015)

    Article  Google Scholar 

  5. Costelloe, D., Ryan, C.: On improving generalisation in genetic programming. In: L. Vanneschi, S. Gustafson, A. Moraglio, I.D. Falco, M. Ebner (eds.) European Conference on Genetic Programming, EuroGP 2009, Tübingen, Germany, April 15–17, 2009, Proceedings, Lecture Notes in Computer Science, vol. 5481, pp. 61–72. Springer (2009)

    Google Scholar 

  6. Dignum, S., Poli, R.: Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat. In: D. Thierens, H.G. Beyer, J. Bongard, J. Branke, J.A. Clark, D. Cliff, C.B. Congdon, K. Deb, B. Doerr, T. Kovacs, S. Kumar, J.F. Miller, J. Moore, F. Neumann, M. Pelikan, R. Poli, K. Sastry, K.O. Stanley, T. Stutzle, R.A. Watson, I. Wegener (eds.) GECCO ’07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, vol. 2, pp. 1588–1595. ACM Press, London (2007)

    Google Scholar 

  7. Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 40(2), 121–144 (2010)

    Google Scholar 

  8. Fortin, F.A., De Rainville, F.M., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: Evolutionary algorithms made easy. Journal of Machine Learning Research 13, 2171–2175 (2012)

    MathSciNet  Google Scholar 

  9. Gelman, A., Jakulin, A., Pittau, M.G., Su, Y.S.: A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat. 2(4), 1360–1383 (2008). https://doi.org/10.1214/08-AOAS191

    Article  MathSciNet  MATH  Google Scholar 

  10. Grinstead, C.M., Snell, J.L.: Introduction to probability. American Mathematical Soc. (2012)

    Google Scholar 

  11. Iba, H., de Garis, H., Sato, T.: Genetic programming using a minimum description length principle. In: K.E. Kinnear, Jr. (ed.) Advances in Genetic Programming, chap. 12, pp. 265–284. MIT Press (1994)

    Google Scholar 

  12. Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: EuroGP, pp. 70–82. Springer (2003)

    Google Scholar 

  13. Keijzer, M., Foster, J.: Crossover bias in genetic programming. In: European Conference on Genetic Programming, pp. 33–44. Springer (2007)

    Google Scholar 

  14. Korns, M.F.: Accuracy in symbolic regression. In: R. Riolo, E. Vladislavleva, J.H. Moore (eds.) Genetic Programming Theory and Practice IX, Genetic and Evolutionary Computation, pp. 129–151. Springer, New York (2011)

    Google Scholar 

  15. Koza, J.: Genetic Programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA (1992)

    MATH  Google Scholar 

  16. Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Advances in neural information processing systems, pp. 950–957 (1992)

    Google Scholar 

  17. Langdon, W.B., Poli, R.: Fitness causes bloat. In: P.K. Chawdhry, R. Roy, R.K. Pant (eds.) Soft Computing in Engineering Design and Manufacturing, pp. 13–22. Springer London (1998). https://doi.org/10.1007/978-1-4471-0427-8_2

  18. Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2013)

  19. Luke, S., Panait, L.: A comparison of bloat control methods for genetic programming. Evolutionary Computation 14(3), 309–344 (2006)

    Article  Google Scholar 

  20. Mauceri, S., Sweeney, J., McDermott, J.: One-class subject authentication using feature extraction by grammatical evolution on accelerometer data. In: Proceedings of META 2018, 7th International Conference on Metaheuristics and Nature Inspired computing. Marrakesh, Morocco (2018)

    Google Scholar 

  21. McDermott, J.: Measuring mutation operators’ exploration-exploitation behaviour and long-term biases. In: M. Nicolau, K. Krawiec, M.I. Heywood, M. Castelli, P. García-Sánchez, J.J. Merelo, V.M.R. Santos, K. Sim (eds.) 17th European Conference on Genetic Programming, LNCS, vol. 8599, pp. 100–111. Springer, Granada, Spain (2014)

    Google Scholar 

  22. McDermott, J., Agapitos, A., Brabazon, A., O’Neill, M.: Geometric semantic genetic programming for financial data. In: Applications of Evolutionary Computation, pp. 215–226. Springer (2014)

    Google Scholar 

  23. Moraglio, A.: Towards a geometric unification of evolutionary algorithms. Ph.D. thesis, University of Essex (2007)

    Google Scholar 

  24. Moraglio, A., Krawiec, K., Johnson, C.: Geometric semantic genetic programming. In: Proc. PPSN XII: Parallel problem solving from nature, pp. 21–31. Springer, Taormina, Italy (2012)

    Google Scholar 

  25. Moraglio, A., Mambrini, A.: Runtime analysis of mutation-based geometric semantic genetic programming for basis functions regression. In: Proceedings of the 15th annual conference on Genetic and evolutionary computation, pp. 989–996. ACM (2013)

    Google Scholar 

  26. Ni, J., Drieberg, R.H., Rockett, P.I.: The use of an analytic quotient operator in genetic programming. IEEE Transactions on Evolutionary Computation 17(1), 146–152 (2013)

    Article  Google Scholar 

  27. Nicolau, M., Agapitos, A.: On the effect of function set to the generalisation of symbolic regression models. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 272–273. ACM (2018)

    Google Scholar 

  28. Nicolau, M., Agapitos, A.: Function sets and their generalisation effect in symbolic regression models (2019). In review

    Google Scholar 

  29. Poli, R.: A simple but theoretically-motivated method to control bloat in genetic programming. In: C. Ryan, T. Soule, M. Keijzer, E. Tsang, R. Poli, E. Costa (eds.) Genetic Programming, Proceedings of EuroGP’2003, LNCS, vol. 2610, pp. 204–217. Springer-Verlag, Essex (2003)

    Chapter  Google Scholar 

  30. Poli, R., Langdon, W.B., Dignum, S.: On the limiting distribution of program sizes in tree-based genetic programming. In: European Conference on Genetic Programming, pp. 193–204. Springer (2007)

    Google Scholar 

  31. Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk (2008)

  32. Rosca, J.P., et al.: Analysis of complexity drift in genetic programming. Genetic Programming pp. 286–294 (1997)

    Google Scholar 

  33. Silva, S., Costa, E.: Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genetic Programming and Evolvable Machines 10(2), 141–179 (2009)

    Article  Google Scholar 

  34. Silva, S., Dignum, S.: Extending operator equalisation: Fitness based self adaptive length distribution for bloat free GP. In: EuroGP, pp. 159–170. Springer (2009)

    Google Scholar 

  35. Silva, S., Vanneschi, L.: The importance of being flat—studying the program length distributions of operator equalisation. In: R. Riolo, K. Vladislavleva, J. Moore (eds.) Genetic Programming Theory and Practice IX, pp. 211–233. Springer (2011)

    Google Scholar 

  36. Springer, M.D.: The algebra of random variables. Wiley (1979)

    Google Scholar 

  37. Stephens, T.: GPLearn (2015). https://github.com/trevorstephens/gplearn, viewed 1 April 2019

  38. Vanneschi, L., Silva, S., Castelli, M., Manzoni, L.: Geometric semantic genetic programming for real life applications. In: Genetic programming theory and practice xi, pp. 191–209. Springer (2014)

    Google Scholar 

  39. Vladislavleva, E.J., Smits, G.F., den Hertog, D.: Order of nonlinearity as a complexity measure for models generated by symbolic regression via Pareto genetic programming. IEEE Transactions on Evolutionary Computation 13(2), 333–349 (2009)

    Article  Google Scholar 

  40. Whigham, P.A.: Inductive bias and genetic programming (1995)

    Google Scholar 

  41. Whigham, P.A., McKay, R.I.: Genetic approaches to learning recursive relations. In: X. Yao (ed.) Progress in Evolutionary Computation, Lecture Notes in Artificial Intelligence, vol. 956, pp. 17–27. Springer-Verlag (1995)

    Google Scholar 

Download references

Acknowledgement

Thanks to Alberto Moraglio and Alexandros Agapitos for discussion.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miguel Nicolau .

Editor information

Editors and Affiliations

Appendix A: Table of Distribution Statistics

Appendix A: Table of Distribution Statistics

See Table 11.3.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Nicolau, M., McDermott, J. (2020). Genetic Programming Symbolic Regression: What Is the Prior on the Prediction?. In: Banzhaf, W., Goodman, E., Sheneman, L., Trujillo, L., Worzel, B. (eds) Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-39958-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-39958-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-39957-3

  • Online ISBN: 978-3-030-39958-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics