A Comparison of Fitness-Case Sampling Methods for Symbolic Regression with Genetic Programming

Martínez, Yuliana; Trujillo, Leonardo; Naredo, Enrique; Legrand, Pierrick

doi:10.1007/978-3-319-07494-8_14

A Comparison of Fitness-Case Sampling Methods for Symbolic Regression with Genetic Programming

Yuliana Martínez¹²,
Leonardo Trujillo¹²,
Enrique Naredo¹² &
…
Pierrick Legrand^13,14

Conference paper

727 Accesses
6 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 288))

Abstract

The canonical approach towards fitness evaluation in Genetic Programming (GP) is to use a static training set to determine fitness, based on a cost function averaged over all fitness-cases. However, motivated by different goals, researchers have recently proposed several techniques that focus selective pressure on a subset of fitness-cases at each generation. These approaches can be described as fitness-case sampling techniques, where the training set is sampled, in some way, to determine fitness. This paper shows a comprehensive evaluation of some of the most recent sampling methods, using benchmark and real-world problems for symbolic regression. The algorithms considered here are Interleaved Sampling, Random Interleaved Sampling, Lexicase Selection and a new sampling technique is proposed called Keep-Worst Interleaved Sampling (KW-IS). The algorithms are extensively evaluated based on test performance, overfitting and bloat. Results suggest that sampling techniques can improve performance compared with standard GP. While on synthetic benchmarks the difference is slight or none at all, on real-world problems the differences are substantial. Some of the best results were achieved by Lexicase Selection and Keep Worse-Interleaved Sampling. Results also show that on real-world problems overfitting correlates strongly with bloating. Furthermore, the sampling techniques provide efficiency, since they reduce the number of fitness-case evaluations required over an entire run.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Doucette, J., Heywood, M.I.: Gp classification under imbalanced data sets: Active sub-sampling and auc approximation. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcázar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 266–277. Springer, Heidelberg (2008)
Chapter Google Scholar
Gathercole, C., Ross, P.: Dynamic training subset selection for supervised learning in genetic programming. In: Davidor, Y., Männer, R., Schwefel, H.-P. (eds.) PPSN 1994. LNCS, vol. 866, pp. 312–321. Springer, Heidelberg (1994)
Chapter Google Scholar
Giacobini, M., Tomassini, M., Vanneschi, L.: Limiting the number of fitness cases in genetic programming using statistics. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 371–380. Springer, Heidelberg (2002)
Chapter Google Scholar
Gonçalves, I., Silva, S.: Experiments on controlling overfitting in genetic programming. In: 15th Portuguese Conference on Artificial Intelligence (EPIA 2011) (October 2011)
Google Scholar
Gonçalves, I., Silva, S.: Balancing learning and overfitting in genetic programming with interleaved sampling of training data. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831, pp. 73–84. Springer, Heidelberg (2013)
Chapter Google Scholar
Harper, R.: Spatial co-evolution: Quicker, fitter and less bloated. In: Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference, GECCO 2012, pp. 759–766. ACM, New York (2012)
Google Scholar
Klein, J., Spector, L.: Genetic programming with historically assessed hardness. In: Genetic Programming Theory and Practice VI. Genetic and Evolutionary Computation, pp. 1–14. Springer US (2009)
Google Scholar
Lasarczyk, C.W.G., Dittrich, P.W.G., Banzhaf, W.W.G.: Dynamic subset selection based on a fitness case topology. Evol. Comput. 12(2), 223–242 (2004)
Article Google Scholar
Lehman, J., Stanley, K.O.: Exploiting open-endedness to solve problems through the search for novelty. In: Proceedings of the Eleventh International Conference on Artificial Life, Cambridge, MA, ALIFE XI. MIT Press (2008)
Google Scholar
Martínez, Y., Naredo, E., Trujillo, L., López, E.G.: Searching for novel regression functions. In: IEEE Congress on Evolutionary Computation, pp. 16–23 (2013)
Google Scholar
McDermott, J., White, D.R., Luke, S., Manzoni, L., Castelli, M., Vanneschi, L., Jaskowski, W., Krawiec, K., Harper, R., De Jong, K., O’Reilly, U.-M.: Genetic programming needs better benchmarks. In: Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference, GECCO 2012, pp. 791–798. ACM, New York (2012)
Google Scholar
Schmidt, M., Lipson, H.: Coevolving fitness models for accelerating evolution and reducing evaluations. In: Riolo, R., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice IV. Genetic and Evolutionary Computation, pp. 113–130. Springer US (2007)
Google Scholar
Silva, S., Almeida, J.: Gplab–a genetic programming toolbox for matlab. In: Gregersen, L. (ed.) Proceedings of the Nordic MATLAB Conference, pp. 273–278 (2003)
Google Scholar
Spector, L.: Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In: Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference Companion, GECCO Companion 2012, pp. 401–408. ACM (2012)
Google Scholar
Trujillo, L., Spector, L., Naredo, E., Martínez, Y.: A behavior-based analysis of modal problems. In: GECCO (Companion), pp. 1047–1054 (2013)
Google Scholar
Uy, N.Q., Hoai, N.X., O’Neill, M., Mckay, R.I., Galván-López, E.: Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genetic Programming and Evolvable Machines 12(2), 91–119 (2011)
Article Google Scholar
Vanneschi, L., Castelli, M., Silva, S.: Measuring bloat, overfitting and functional complexity in genetic programming. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO 2010, pp. 877–884. ACM, New York (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

TREE-LAB, Doctorado en Ciencias de la Ingeniería, Departamento de Ingeniería Eléctrica y Electrónica, Instituto Tecnológico de Tijuana, Blvd. Industrial y Av. ITR Tijuana S/N, Mesa Otay, C.P. 22500, Tijuana, B.C., México
Yuliana Martínez, Leonardo Trujillo & Enrique Naredo
UMR CNRS 5251, Université Victor Segalen Bordeaux 2 and The Institut de Mathmatiques de Bordeaux, Bordeaux, France
Pierrick Legrand
ALEA Team, INRIA Bordeaux Sud-Ouest, Talence, France
Pierrick Legrand

Authors

Yuliana Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Trujillo
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Naredo
View author publications
You can also search for this author in PubMed Google Scholar
Pierrick Legrand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuliana Martínez .

Editor information

Editors and Affiliations

Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Luxembourg, Luxembourg
Alexandru-Adrian Tantar
Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, Luxembourg, Luxembourg
Emilia Tantar
School of Engineering, University of California, Merced, California, USA
Jian-Qiao Sun
College of Mechanical Engineering, Beijing University of Technology, Beijing, China
Wei Zhang
Department of Mechanics, Tianjin University, Tianjin, China
Qian Ding
Depto. Computación, CINVESTAV-IPN, Mexico City, Mexico
Oliver Schütze
Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
Michael Emmerich
Université de Bordeaux, Bordeaux, France
Pierrick Legrand
School of Mathematics and Statistics, University of New South Wales, Sydney, New South Wales, Australia
Pierre Del Moral
Computer Science Department, CINVESTAV-IPN, Mexico City, Mexico
Carlos A. Coello Coello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martínez, Y., Trujillo, L., Naredo, E., Legrand, P. (2014). A Comparison of Fitness-Case Sampling Methods for Symbolic Regression with Genetic Programming. In: Tantar, AA., et al. EVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation V. Advances in Intelligent Systems and Computing, vol 288. Springer, Cham. https://doi.org/10.1007/978-3-319-07494-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-07494-8_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07493-1
Online ISBN: 978-3-319-07494-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics