Abstract
In the context of Genetic Programming Symbolic Regression, we empirically investigate the prior on the output prediction, that is, the distribution of the output prior to observing data. We distinguish between the prior due to initialisation and due to evolutionary search. We also investigate the effect on the prior of maximum tree depth and the effect of different function sets and different independent variable distributions. We find that priors are highly diffuse and sometimes include support for extreme values. We compare priors to values for dependent variables observed in benchmarks and real-world problems, finding that mismatches occur and can affect algorithm behaviour and performance. As a further application of our results, we investigate the behaviour of mutation operators in semantic space.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In many GP configurations, there are no true local optima since the mutation operator can jump to anywhere in the search space in single step. We can informally define a local pseudo-optimum as a point where improving steps are not impossible but highly unlikely.
- 2.
We adopt the whisker definition of 3rd-quantile + 1.5 * IQR for the upper whisker, and inversely for the lower whisker.
References
Beadle, L., Johnson, C.G.: Semantic analysis of program initialisation in genetic programming. Genetic Programming and Evolvable Machines 10(3), 307–337 (2009)
Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Transactions on Evolutionary Computation 17(3), 368–386 (2013)
Burbidge, J.B., Magee, L., Robb, A.L.: Alternative transformations to handle extreme values of the dependent variable. Journal of the American Statistical Association 83(401), 123–127 (1988). https://doi.org/10.1080/01621459.1988.10478575. https://amstat.tandfonline.com/doi/abs/10.1080/01621459.1988.10478575
Castelli, M., Silva, S., Vanneschi, L.: A C+ + framework for geometric semantic genetic programming. Genetic Programming and Evolvable Machines 16(1), 73–81 (2015)
Costelloe, D., Ryan, C.: On improving generalisation in genetic programming. In: L. Vanneschi, S. Gustafson, A. Moraglio, I.D. Falco, M. Ebner (eds.) European Conference on Genetic Programming, EuroGP 2009, Tübingen, Germany, April 15–17, 2009, Proceedings, Lecture Notes in Computer Science, vol. 5481, pp. 61–72. Springer (2009)
Dignum, S., Poli, R.: Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat. In: D. Thierens, H.G. Beyer, J. Bongard, J. Branke, J.A. Clark, D. Cliff, C.B. Congdon, K. Deb, B. Doerr, T. Kovacs, S. Kumar, J.F. Miller, J. Moore, F. Neumann, M. Pelikan, R. Poli, K. Sastry, K.O. Stanley, T. Stutzle, R.A. Watson, I. Wegener (eds.) GECCO ’07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, vol. 2, pp. 1588–1595. ACM Press, London (2007)
Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 40(2), 121–144 (2010)
Fortin, F.A., De Rainville, F.M., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: Evolutionary algorithms made easy. Journal of Machine Learning Research 13, 2171–2175 (2012)
Gelman, A., Jakulin, A., Pittau, M.G., Su, Y.S.: A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat. 2(4), 1360–1383 (2008). https://doi.org/10.1214/08-AOAS191
Grinstead, C.M., Snell, J.L.: Introduction to probability. American Mathematical Soc. (2012)
Iba, H., de Garis, H., Sato, T.: Genetic programming using a minimum description length principle. In: K.E. Kinnear, Jr. (ed.) Advances in Genetic Programming, chap. 12, pp. 265–284. MIT Press (1994)
Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: EuroGP, pp. 70–82. Springer (2003)
Keijzer, M., Foster, J.: Crossover bias in genetic programming. In: European Conference on Genetic Programming, pp. 33–44. Springer (2007)
Korns, M.F.: Accuracy in symbolic regression. In: R. Riolo, E. Vladislavleva, J.H. Moore (eds.) Genetic Programming Theory and Practice IX, Genetic and Evolutionary Computation, pp. 129–151. Springer, New York (2011)
Koza, J.: Genetic Programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA (1992)
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Advances in neural information processing systems, pp. 950–957 (1992)
Langdon, W.B., Poli, R.: Fitness causes bloat. In: P.K. Chawdhry, R. Roy, R.K. Pant (eds.) Soft Computing in Engineering Design and Manufacturing, pp. 13–22. Springer London (1998). https://doi.org/10.1007/978-1-4471-0427-8_2
Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2013)
Luke, S., Panait, L.: A comparison of bloat control methods for genetic programming. Evolutionary Computation 14(3), 309–344 (2006)
Mauceri, S., Sweeney, J., McDermott, J.: One-class subject authentication using feature extraction by grammatical evolution on accelerometer data. In: Proceedings of META 2018, 7th International Conference on Metaheuristics and Nature Inspired computing. Marrakesh, Morocco (2018)
McDermott, J.: Measuring mutation operators’ exploration-exploitation behaviour and long-term biases. In: M. Nicolau, K. Krawiec, M.I. Heywood, M. Castelli, P. García-Sánchez, J.J. Merelo, V.M.R. Santos, K. Sim (eds.) 17th European Conference on Genetic Programming, LNCS, vol. 8599, pp. 100–111. Springer, Granada, Spain (2014)
McDermott, J., Agapitos, A., Brabazon, A., O’Neill, M.: Geometric semantic genetic programming for financial data. In: Applications of Evolutionary Computation, pp. 215–226. Springer (2014)
Moraglio, A.: Towards a geometric unification of evolutionary algorithms. Ph.D. thesis, University of Essex (2007)
Moraglio, A., Krawiec, K., Johnson, C.: Geometric semantic genetic programming. In: Proc. PPSN XII: Parallel problem solving from nature, pp. 21–31. Springer, Taormina, Italy (2012)
Moraglio, A., Mambrini, A.: Runtime analysis of mutation-based geometric semantic genetic programming for basis functions regression. In: Proceedings of the 15th annual conference on Genetic and evolutionary computation, pp. 989–996. ACM (2013)
Ni, J., Drieberg, R.H., Rockett, P.I.: The use of an analytic quotient operator in genetic programming. IEEE Transactions on Evolutionary Computation 17(1), 146–152 (2013)
Nicolau, M., Agapitos, A.: On the effect of function set to the generalisation of symbolic regression models. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 272–273. ACM (2018)
Nicolau, M., Agapitos, A.: Function sets and their generalisation effect in symbolic regression models (2019). In review
Poli, R.: A simple but theoretically-motivated method to control bloat in genetic programming. In: C. Ryan, T. Soule, M. Keijzer, E. Tsang, R. Poli, E. Costa (eds.) Genetic Programming, Proceedings of EuroGP’2003, LNCS, vol. 2610, pp. 204–217. Springer-Verlag, Essex (2003)
Poli, R., Langdon, W.B., Dignum, S.: On the limiting distribution of program sizes in tree-based genetic programming. In: European Conference on Genetic Programming, pp. 193–204. Springer (2007)
Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk (2008)
Rosca, J.P., et al.: Analysis of complexity drift in genetic programming. Genetic Programming pp. 286–294 (1997)
Silva, S., Costa, E.: Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genetic Programming and Evolvable Machines 10(2), 141–179 (2009)
Silva, S., Dignum, S.: Extending operator equalisation: Fitness based self adaptive length distribution for bloat free GP. In: EuroGP, pp. 159–170. Springer (2009)
Silva, S., Vanneschi, L.: The importance of being flat—studying the program length distributions of operator equalisation. In: R. Riolo, K. Vladislavleva, J. Moore (eds.) Genetic Programming Theory and Practice IX, pp. 211–233. Springer (2011)
Springer, M.D.: The algebra of random variables. Wiley (1979)
Stephens, T.: GPLearn (2015). https://github.com/trevorstephens/gplearn, viewed 1 April 2019
Vanneschi, L., Silva, S., Castelli, M., Manzoni, L.: Geometric semantic genetic programming for real life applications. In: Genetic programming theory and practice xi, pp. 191–209. Springer (2014)
Vladislavleva, E.J., Smits, G.F., den Hertog, D.: Order of nonlinearity as a complexity measure for models generated by symbolic regression via Pareto genetic programming. IEEE Transactions on Evolutionary Computation 13(2), 333–349 (2009)
Whigham, P.A.: Inductive bias and genetic programming (1995)
Whigham, P.A., McKay, R.I.: Genetic approaches to learning recursive relations. In: X. Yao (ed.) Progress in Evolutionary Computation, Lecture Notes in Artificial Intelligence, vol. 956, pp. 17–27. Springer-Verlag (1995)
Acknowledgement
Thanks to Alberto Moraglio and Alexandros Agapitos for discussion.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix A: Table of Distribution Statistics
Appendix A: Table of Distribution Statistics
See Table 11.3.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Nicolau, M., McDermott, J. (2020). Genetic Programming Symbolic Regression: What Is the Prior on the Prediction?. In: Banzhaf, W., Goodman, E., Sheneman, L., Trujillo, L., Worzel, B. (eds) Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-39958-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-39958-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39957-3
Online ISBN: 978-3-030-39958-0
eBook Packages: Computer ScienceComputer Science (R0)