Abstract
Symbolic regression, as a regression analysis technique, can find the structure and coefficients of a regression model simultaneously. Genetic programming is an attractive and leading technique for symbolic regression, since it does not require any predefined model structure and has a flexible representation. However, genetic-programming-based symbolic regression (GPSR) often has a poor generalisation ability that hampers its applications to science or industry modelling. In recent years, many researchers have realised the issue and devoted much effort to enhance the generalisation ability of GPSR. This chapter first introduces the generalisation in GPSR and then reviews the state-of-the-art contributions. This chapter also analyses challenges in the area and highlights a number of future directions for interested researchers.
The authors are with the Evolutionary Computation Research Group at the School of Engineering and Computer Science, Victoria University of Wellington.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
A. Agapitos, A. Brabazon, M. O’Neill, Controlling overfitting in symbolic regression based on a bias/variance error decomposition, in Parallel Problem Solving from Nature-PPSN XII (Springer, Berlin, 2012), pp. 438–447
S.-i. Amari, S. Wu, Improving support vector machine classifiers by modifying kernel functions. Neural Netw. 12(6), 783–789 (1999)
D.A. Augusto, H.J. Barbosa, Symbolic regression via genetic programming, in Proceedings. Vol. 1. Sixth Brazilian Symposium on Neural Networks (IEEE, Piscataway, 2000), pp. 173–178
R.M.A. Azad, C. Ryan, Variance based selection to improve test set performance in genetic programming, in Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO) (2011), pp. 1315–1322
W. Banzhaf, P. Nordin, R.E. Keller, F.D. Francone, Genetic Programming—An Introduction: On the Automatic Evolution of Computer Programs and Its Applications (dpunkt-Verlag and Morgan Kaufmann, San Francisco, 1998)
C.M. Bishop et al., Pattern Recognition and Machine Learning, vol. 4 (Springer, New York, 2006)
A. Blumer, A. Ehrenfeucht, D. Haussler, M.K. Warmuth, Occam’s razor. Inf. Process. Lett. 24(6), 377–380 (1987)
M. Brameier, W. Banzhaf, A comparison of linear genetic programming and neural networks in medical data mining. IEEE Trans. Evol. Comput. 5(1), 17–26 (2001)
M. Castelli, I. Gonçalves, L. Manzoni, L. Vanneschi, Pruning techniques for mixed ensembles of genetic programming models, in Proceedings of the European Conference on Genetic Programming (EuroGP) (Springer, Berlin, 2018), pp. 52–67
Q. Chen, B. Xue, M. Zhang, Generalisation and domain adaptation in GP with gradient descent for symbolic regression, in 2015 IEEE Congress on Evolutionary Computation (CEC), May 2015, pp. 1137–1144
Q. Chen, B. Xue, L. Shang, M. Zhang, Improving generalisation of genetic programming for symbolic regression with structural risk minimisation, in Proceedings of the 18th Annual Conference on Genetic and Evolutionary Computation (GECCO) (ACM, New York, 2016), pp. 709–716
Q. Chen, B. Xue, Y. Mei, M. Zhang, Geometric semantic crossover with an angle-aware mating scheme in genetic programming for symbolic regression, in Proceedings of the European Conference on Genetic Programming (EuroGP) (Springer, Berlin, 2017), pp. 229–245
Q. Chen, M. Zhang, B. Xue, Feature selection to improve generalization of genetic programming for high-dimensional symbolic regression. IEEE Trans. Evol. Comput. 21(5), 792–806 (2017)
Q. Chen, M. Zhang, B. Xue, New geometric semantic operators in genetic programming: perpendicular crossover and random segment mutation, in Proceedings of the 19th Annual Conference on Genetic and Evolutionary Computation Conference Companion (2017), pp. 223–224
Q. Chen, B. Xue, M. Zhang, Instance based transfer learning for genetic programming for symbolic regression, in 2019 IEEE Congress on Evolutionary Computation (CEC) (IEEE, Piscataway, 2019), pp. 3006–3013
Q. Chen, M. Zhang, B. Xue, Structural risk minimization-driven genetic programming for enhancing generalization in symbolic regression. IEEE Trans. Evol. Comput. 23(4), 703–717 (2019)
Q. Chen, B. Xue, M. Zhang, Rademacher complexity for enhancing the generalization of genetic programming for symbolic regression. IEEE Trans. Cybern. (2020). https://doi.org/10.1109/TCYB.2020.3004361
D. Cohn, L. Atlas, R. Ladner, Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994)
W. Dai, Q. Yang, G.-R. Xue, Y. Yu, Boosting for transfer learning, in Proceedings of the 24th International Conference on Machine Learning (ACM, New York, 2007), pp. 193–200
G. Dick, Sensitivity-like analysis for feature selection in genetic programming, in Proceedings of the 19th Annual Conference on Genetic and Evolutionary Computation (GECCO) (2017), pp. 401–408
P. Domingos, A unified bias-variance decomposition for zero-one and squared loss. AAAI/IAAI 2000, 564–569 (2000)
C. Ferreira, U. Gepsoft, What is gene expression programming (2008)
J. Fitzgerald, C. Ryan, On size, complexity and generalisation error in GP, in Proceedings of the 16th Annual Conference on Genetic and Evolutionary Computation Conference (GECCO) (2014), pp. 903–910
J. Fitzgerald, R. Azad, C. Ryan, A bootstrapping approach to reduce over-fitting in genetic programming, in Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation (GECCO) (2013), pp. 1113–1120
J. Friedman, T. Hastie, R. Tibshirani, The Elements of Statistical Learning. Springer Series in Statistics, vol. 1 (Springer, New York, 2001)
C. Gagné, M. Schoenauer, M. Parizeau, M. Tomassini, Genetic programming, validation sets, and parsimony pressure, in Proceedings of the European Conference on Genetic Programming (EuroGP) (Springer, Berlin, 2006), pp. 109–120
S. Geman, E. Bienenstock, R. Doursat, Neural networks and the bias/variance dilemma. Neural Netw. 4(1) (2008)
I. Gonçalves, S. Silva, Balancing learning and overfitting in genetic programming with interleaved sampling of training data, in Proceedings of the European Conference on Genetic Programming (EuroGP) (Springer, Berlin, 2013), pp. 73–84
M. Gulsen, A.E. Smith, A hierarchical genetic algorithm for system identification and curve fitting with a supercomputer implementation, in Evolutionary Algorithms (Springer, Berlin, 1999), pp. 111–137
M. Gulsen, A. Smith, D. Tate, A genetic algorithm approach to curve fitting. Int. J. Prod. Res. 33(7), 1911–1923 (1995)
T. Helmuth, N.F. McPhee, L. Spector, Lexicase selection for program synthesis: a diversity analysis, in Genetic Programming Theory and Practice XIII (Springer, Berlin, 2016), pp. 151–167
N.T. Hien, N.X. Hoai, B. McKay, A study on genetic programming with layered learning and incremental sampling, in 2011 IEEE Congress of Evolutionary Computation (CEC) (IEEE, Piscataway, 2011), pp. 1179–1185
M. Keijzer, Improving symbolic regression with interval arithmetic and linear scaling, in Proceedings of the European Conference on Genetic Programming (EuroGP) (Springer, Berlin, 2003), pp. 70–82
V. Koltchinskii, Rademacher penalties and structural risk minimization. IEEE Trans. Inf. Theory 47(5), 1902–1914 (2001)
M. Kommenda, M. Affenzeller, B. Burlacu, G. Kronberger, S.M. Winkler, Genetic programming with data migration for symbolic regression, in Proceedings of the 16th Annual Conference on Genetic and Evolutionary Computation (GECCO) (2014), pp. 1361–1366
T. Kowaliw, R. Doursat, Bias-variance decomposition in genetic programming. Open Math. 14(1), 62–80 (2016)
J.R. Koza, Genetic Programming II, Automatic Discovery of Reusable Subprograms (MIT Press, Cambridge, 1992)
J. KubalÃk, E. Derner, R. BabuÅ¡ka, Symbolic regression driven by training data and prior knowledge, in Proceedings of the 24th Genetic and Evolutionary Computation Conference (GECCO) (2020), pp. 958–966
I. Kuscu, Generalisation and domain specific functions in genetic programming, in Proceedings of the 2000 Congress on Evolutionary Computation (CEC), vol. 2 (IEEE, Piscataway, 2000), pp. 1393–1400
N. Le, H.N. Xuan, A. Brabazon, T.P. Thi, Complexity measures in genetic programming learning: a brief review, in Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC) (IEEE, Piscataway, 2016), pp. 2409–2416
S. Luke, L. Panait, Fighting bloat with nonparametric parsimony pressure, in International Conference on Parallel Problem Solving from Nature (PPSN) (Springer, Berlin, 2002), pp. 411–421
S. Luke, L. Panait, Lexicographic parsimony pressure, in Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation (GECCO) (Morgan Kaufmann, Burlington, 2002), pp. 829–836
S. Luke, L. Panait, A comparison of bloat control methods for genetic programming. Evol. Comput. 14(3), 309–344 (2006)
Y. MartÃnez, E. Naredo, L. Trujillo, P. Legrand, U. López, A comparison of fitness-case sampling methods for genetic programming. J. Exp. Theor. Artif. Intell. 29(6), 1203–1224 (2017)
J.F. Miller, P. Thomson, Cartesian genetic programming, in Genetic Programming (Springer, Berlin, 2000), pp. 121–132
T.M. Mitchell, Machine Learning (McGraw Hill, Burr Ridge, IL, 1997), p. 45
Q.U. Nguyen, X.H. Nguyen, M. O’Neill, Semantic aware crossover for genetic programming: the case for real-valued function regression, in Genetic Programming (Springer, Berlin, 2009), pp. 292–302
Q.U. Nguyen, X.H. Nguyen, M. O’Neill, Examining the landscape of semantic similarity based mutation, in Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation (GECCO) (ACM, New York, 2011), pp. 1363–1370
J. Ni, R.H. Drieberg, P.I. Rockett, The use of an analytic quotient operator in genetic programming. IEEE Trans. Evol. Comput. 17(1), 146–152 (2012)
M. O’Neill, L. Vanneschi, S. Gustafson, W. Banzhaf, Open issues in genetic programming. Genet. Program Evolvable Mach. 11(3–4), 339–363 (2010)
L. Panait, S. Luke, Methods for evolving robust programs, in Proceedings of the 5th Annual Conference on Genetic and Evolutionary Computation (GECCO) (Springer, Berlin, 2003), pp. 1740–1751
G. Paris, D. Robilliard, C. Fonlupt, Exploring overfitting in genetic programming, in International Conference on Artificial Evolution (Evolution Artificielle) (Springer, Berlin, 2003), pp. 267–277
R. Poli, W.B. Langdon, N.F. McPhee, J.R. Koza, A Field Guide to Genetic Programming (2008). http://Lulu.com
C. Raymond, Q. Chen, B. Xue, M. Zhang, Genetic programming with rademacher complexity for symbolic regression, in 2019 IEEE Congress on Evolutionary Computation (CEC) (IEEE, Piscataway, 2019), pp. 2657–2664
C. Raymond, Q. Chen, B. Xue, M. Zhang, Adaptive weighted splines: a new representation to genetic programming for symbolic regression, in Proceedings of the 24th Genetic and Evolutionary Computation Conference (GECCO) (2020), pp. 1003–1011
D. Rivero, E. Fernandez-Blanco, C. Fernandez-Lozano, A. Pazos, Population subset selection for the use of a validation dataset for overfitting control in genetic programming. J. Exp. Theor. Artif. Intell. 32(2), 243–271 (2020)
S.H. Rudy, S.L. Brunton, J.L. Proctor, J.N. Kutz, Data-driven discovery of partial differential equations. Sci. Adv. 3(4), e1602614 (2017)
M. Schmidt, H. Lipson, Distilling free-form natural laws from experimental data. Science 324(5923), 81–85 (2009)
S. Silva, S. Dignum, L. Vanneschi, Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genet. Program Evolvable Mach. 13(2), 197–238 (2012)
S. Sun, R. Ouyang, B. Zhang, T.-Y. Zhang, Data-driven discovery of formulas by symbolic regression. MRS Bull. 44(7), 559–564 (2019)
C. Tuite, A. Agapitos, M. O’Neill, A. Brabazon, Tackling overfitting in evolutionary-driven financial model induction, in Natural Computing in Computational Finance (Springer, Berlin, 2011), pp. 141–161
N.Q. Uy, N.X. Hoai, M. O’Neill, Semantics based mutation in genetic programming: the case for real-valued symbolic regression, in 15th International Conference on Soft Computing, Mendel, vol. 9 (2009), pp. 73–91
N.Q. Uy, N.X. Hoai, M. O’Neill, R.I. McKay, E. Galván-López, Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program Evolvable Mach. 12(2), 91–119 (2011)
L. Vanneschi, S. Gustafson, Using crossover based similarity measure to improve genetic programming generalization ability, in Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation (GECCO) (2009), pp. 1139–1146
L. Vanneschi, M. Castelli, S. Silva, Measuring bloat, overfitting and functional complexity in genetic programming, in Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation (GECCO) (2010), pp. 877–884
V. Vapnik, Estimation of Dependences Based on Empirical Data (Springer Science & Business Media, Berlin, 2006)
E.J. Vladislavleva, G.F. Smits, D. Den Hertog, Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans. Evol. Comput. 13(2), 333–349 (2008)
E. Vladislavleva, T. Friedrich, F. Neumann, M. Wagner, Predicting the energy output of wind farms based on weather data: Important variables and their correlation. Renew. Energy 50, 236–243 (2013)
M. Willis, H. Hiden, M. Hinchliffe, B. McKay, G.W. Barton, Systems modelling using genetic programming. Comput. Chem. Eng. 21, S1161–S1166 (1997)
C. Xu, W. Wang, P. Liu, A genetic programming model for real-time crash prediction on freeways. IEEE Trans. Intell. Transp. Syst. 14(2), 574–586 (2012)
B. Xue, M. Zhang, W.N. Browne, X. Yao, A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Chen, Q., Xue, B. (2022). Generalisation in Genetic Programming for Symbolic Regression: Challenges and Future Directions. In: Smith, A.E. (eds) Women in Computational Intelligence. Women in Engineering and Science. Springer, Cham. https://doi.org/10.1007/978-3-030-79092-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-79092-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79091-2
Online ISBN: 978-3-030-79092-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)