Time Control or Size Control? Reducing Complexity and Improving Accuracy of Genetic Programming Models

Sambo, Aliyu Sani; Azad, R. Muhammad Atif; Kovalchuk, Yevgeniya; Indramohan, Vivek Padmanaabhan; Shah, Hanifa

doi:10.1007/978-3-030-44094-7_13

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12101))

Included in the following conference series:

European Conference on Genetic Programming (Part of EvoStar)

645 Accesses
6 Citations

Abstract

Complexity of evolving models in genetic programming (GP) can impact both the quality of the models and the evolutionary search. While previous studies have proposed several notions of GP model complexity, the size of a GP model is by far the most researched measure of model complexity. However, previous studies have also shown that controlling the size does not automatically improve the accuracy of GP models, especially the accuracy on out of sample (test) data. Furthermore, size does not represent the functional composition of a model, which is often related to its accuracy on test data. In this study, we explore the evaluation time of GP models as a measure of their complexity; we define the evaluation time as the time taken to evaluate a model over some data. We demonstrate that the evaluation time reflects both a model’s size and its composition; also, we show how to measure the evaluation time reliably. To validate our proposal, we leverage four well-known methods to size-control but to control evaluation times instead of the tree sizes; we thus compare size-control with time-control. The results show that time-control with a nuanced notion of complexity produces more accurate models on 17 out of 20 problem scenarios. Even when the models have slightly greater times and sizes, time-control counterbalances via superior accuracy on both training and test data. The paper also argues that time-control can differentiate functional complexity even better in an identically-sized population. To facilitate this, the paper proposes Fixed Length Initialisation (FLI) that creates an identically-sized but functionally-diverse population. The results show that while FLI particularly suits time-control, it also generally improves the performance of size-control. Overall, the paper poses evaluation-time as a viable alternative to tree sizes to measure complexity in GP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Azad, R.M.A., Ryan, C.: Variance based selection to improve test set performance in genetic programming. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, pp. 1315–1322. ACM, Dublin (2011). http://dl.acm.org/citation.cfm?id=2001754
Azad, R.M.A., Ryan, C.: A simple approach to lifetime learning in genetic programming based symbolic regression. Evol. Comput. 22(2), 287–317 (2014). https://doi.org/10.1162/EVCO_a_00111. http://www.mitpressjournals.org/doi/abs/10.1162/EVCOa00111
Article Google Scholar
Couture, M.: Complexity and chaos-state-of-the-art; formulations and measures of complexity. Technical report, Defence research and development Canada Valcartier, Quebec (2007)
Google Scholar
Dignum, S., Poli, R.: Operator equalisation and bloat free GP. In: O’Neill, M., et al. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 110–121. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78671-9_10
Chapter Google Scholar
Dua, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Falco, I.D., Iazzetta, A., Tarantino, E., Cioppa, A.D., Trautteur, G.: A kolmogorov complexity-based genetic programming tool for string compression. In: Proceedings of the 2nd Annual Conference on Genetic and Evolutionary Computation, pp. 427–434. Morgan Kaufmann Publishers Inc., Las Vegas (2000)
Google Scholar
Griinwald, P.: Introducing the minimum description length principle. Adv. Minimum Description Length: Theory Appl. 3, 3–22 (2005)
Google Scholar
Gustafson, S., Burke, E.K., Krasnogor, N.: On improving genetic programming for symbolic regression. In: Corne, D., et al. (eds.) Proceedings of the 2005 IEEE Congress on Evolutionary Computation, vol. 1, pp. 912–919. IEEE Press, Edinburgh, 2–5 September 2005. http://ieeexplore.ieee.org/servlet/opac?punumber=10417&isvol=1
Iba, H., de Garis, H., Sato, T.: Genetic programming using a minimum description length principle. In: Kinnear, Jr., K.E. (ed.) Advances in Genetic Programming, chap. 12, pp. 265–284. MIT Press, Cambridge, MA, USA (1994). http://cognet.mit.edu/sites/default/files/books/9780262277181/pdfs/9780262277181_chap12.pdf
Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 70–82. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36599-0_7
Chapter Google Scholar
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992). http://mitpress.mit.edu/books/genetic-programming
MATH Google Scholar
Kulkarni, S.R., Harman, G.: Statistical learning theory: a tutorial. Wiley Interdisc. Rev.: Comput. Stat. 3(6), 543–556 (2011). https://doi.org/10.1002/wics.179
Article Google Scholar
Kumar, A., Goyal, S., Varma, M.: Resource-efficient machine learning in 2 KB RAM for the internet of things. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, International Convention Centre, vol. 70, pp. 1935–1944. Sydney, 06–11 August 2017
Google Scholar
Lipton, Z.C.: The mythos of model interpretability. Commun. ACM 61(10), 36–43 (2018). https://doi.org/10.1145/3233231
Article Google Scholar
Luke, S., Panait, L.: Fighting bloat with nonparametric parsimony pressure. In: Guervós, J.J.M., Adamidis, P., Beyer, H.-G., Schwefel, H.-P., Fernández-Villacañas, J.-L. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 411–421. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45712-7_40. http://www.springerlink.com/openurl.asp?genre=article&issn=0302-9743&volume=2439&spage=411
Chapter Google Scholar
Luke, S., Panait, L.: A comparison of bloat control methods for genetic programming. Evol. Comput. 14(3), 309–344 (2006). https://doi.org/10.1162/evco.2006.14.3.309. http://cognet.mit.edu/system/cogfiles/journalpdfs/evco.2006.14.3.309.pdf
Article Google Scholar
Mei, Y., Nguyen, S., Zhang, M.: Evolving time-invariant dispatching rules in job shop scheduling with genetic programming. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 147–163. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_10
Chapter Google Scholar
Paris, G., Robilliard, D., Fonlupt, C.: Exploring overfitting in genetic programming. In: Liardet, P., Collet, P., Fonlupt, C., Lutton, E., Schoenauer, M. (eds.) EA 2003. LNCS, vol. 2936, pp. 267–277. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24621-3_22
Chapter Google Scholar
Poli, R.: A simple but theoretically-motivated method to control bloat in genetic programming. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 204–217. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36599-0_19. http://www.springerlink.com/openurl.asp?genre=article&issn=0302-9743&volume=2610&spage=204
Chapter MATH Google Scholar
Silva, S., Dignum, S., Vanneschi, L.: Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genet. Program Evolvable Mach. 13(2), 197–238 (2012). https://doi.org/10.1007/s10710-011-9150-5
Article Google Scholar
Vanneschi, L., Castelli, M., Silva, S.: Measuring bloat, overfitting and functional complexity in genetic programming. In: GECCO 2010: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, pp. 877–884. ACM, Portland, 7–11 July 2010. https://doi.org/10.1145/1830483.1830643
Vapnik, V.N.: Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley, New York (1998). OCLC: 845016043
Google Scholar
Vladislavleva, E.J., Smits, G.F., Den Hertog, D.: Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans. Evol. Comput. 13(2), 333–349 (2009). https://doi.org/10.1109/TEVC.2008.926486. http://ieeexplore.ieee.org/document/4632147/
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Digital Technology, Birmingham City University, Birmingham, UK
Aliyu Sani Sambo, R. Muhammad Atif Azad & Yevgeniya Kovalchuk
School of Health Science, Birmingham City University, Birmingham, UK
Vivek Padmanaabhan Indramohan
Faculty of Computing, Engineering and the Built Environment, Birmingham City University, Birmingham, UK
Hanifa Shah

Authors

Aliyu Sani Sambo
View author publications
You can also search for this author in PubMed Google Scholar
R. Muhammad Atif Azad
View author publications
You can also search for this author in PubMed Google Scholar
Yevgeniya Kovalchuk
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Padmanaabhan Indramohan
View author publications
You can also search for this author in PubMed Google Scholar
Hanifa Shah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aliyu Sani Sambo .

Editor information

Editors and Affiliations

Queen's University, Kingston, ON, Canada
Ting Hu
University of Coimbra, Coimbra, Portugal
Nuno Lourenço
University of Trieste, Trieste, Italy
Eric Medvet
Pablo de Olavide University, Seville, Spain
Federico Divina

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sambo, A.S., Azad, R.M.A., Kovalchuk, Y., Indramohan, V.P., Shah, H. (2020). Time Control or Size Control? Reducing Complexity and Improving Accuracy of Genetic Programming Models. In: Hu, T., Lourenço, N., Medvet, E., Divina, F. (eds) Genetic Programming. EuroGP 2020. Lecture Notes in Computer Science(), vol 12101. Springer, Cham. https://doi.org/10.1007/978-3-030-44094-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-44094-7_13
Published: 09 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44093-0
Online ISBN: 978-3-030-44094-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics