Bayesian Model Selection for Reducing Bloat and Overfitting in Genetic Programming for Symbolic Regression
Created by W.Langdon from
gp-bibliography.bib Revision:1.8129
- @InProceedings{bomarito:2022:GECCOcomp,
-
author = "Geoffrey Bomarito and Patrick Leser and
Nolan Strauss and Karl Garbrecht and Jacob Hochhalter",
-
title = "Bayesian Model Selection for Reducing Bloat and
Overfitting in Genetic Programming for Symbolic
Regression",
-
booktitle = "Proceedings of the 2022 Genetic and Evolutionary
Computation Conference Companion",
-
year = "2022",
-
editor = "Heike Trautmann and Carola Doerr and
Alberto Moraglio and Thomas Bartz-Beielstein and Bogdan Filipic and
Marcus Gallagher and Yew-Soon Ong and
Abhishek Gupta and Anna V Kononova and Hao Wang and
Michael Emmerich and Peter A. N. Bosman and Daniela Zaharie and
Fabio Caraffini and Johann Dreo and Anne Auger and
Konstantin Dietric and Paul Dufosse and Tobias Glasmachers and
Nikolaus Hansen and Olaf Mersmann and Petr Posik and
Tea Tusar and Dimo Brockhoff and Tome Eftimov and
Pascal Kerschke and Boris Naujoks and Mike Preuss and
Vanessa Volz and Bilel Derbel and Ke Li and
Xiaodong Li and Saul Zapotecas and Qingfu Zhang and
Mark Coletti and Catherine (Katie) Schuman and
Eric ``Siggy'' Scott and Robert Patton and Paul Wiegand and
Jeffrey K. Bassett and Chathika Gunaratne and Tinkle Chugh and
Richard Allmendinger and Jussi Hakanen and
Daniel Tauritz and John Woodward and Manuel Lopez-Ibanez and
John McCall and Jaume Bacardit and
Alexander Brownlee and Stefano Cagnoni and Giovanni Iacca and
David Walker and Jamal Toutouh and UnaMay O'Reilly and
Penousal Machado and Joao Correia and Sergio Nesmachnow and
Josu Ceberio and Rafael Villanueva and Ignacio Hidalgo and
Francisco {Fernandez de Vega} and Giuseppe Paolo and
Alex Coninx and Antoine Cully and Adam Gaier and
Stefan Wagner and Michael Affenzeller and Bobby R. Bruce and
Vesna Nowack and Aymeric Blot and Emily Winter and
William B. Langdon and Justyna Petke and
Silvino {Fernandez Alzueta} and Pablo {Valledor Pellicer} and
Thomas Stuetzle and David Paetzel and
Alexander Wagner and Michael Heider and Nadarajen Veerapen and
Katherine Malan and Arnaud Liefooghe and Sebastien Verel and
Gabriela Ochoa and Mohammad Nabi Omidvar and
Yuan Sun and Ernesto Tarantino and De Falco Ivanoe and
Antonio {Della Cioppa} and Scafuri Umberto and John Rieffel and
Jean-Baptiste Mouret and Stephane Doncieux and
Stefanos Nikolaidis and Julian Togelius and
Matthew C. Fontaine and Serban Georgescu and Francisco Chicano and
Darrell Whitley and Oleksandr Kyriienko and Denny Dahl and
Ofer Shir and Lee Spector and Alma Rahat and
Richard Everson and Jonathan Fieldsend and Handing Wang and
Yaochu Jin and Erik Hemberg and Marwa A. Elsayed and
Michael Kommenda and William {La Cava} and
Gabriel Kronberger and Steven Gustafson",
-
pages = "526--529",
-
address = "Boston, USA",
-
series = "GECCO '22",
-
month = "9-13 " # jul,
-
organisation = "SIGEVO",
-
publisher = "Association for Computing Machinery",
-
publisher_address = "New York, NY, USA",
-
keywords = "genetic algorithms, genetic programming",
-
isbn13 = "978-1-4503-9268-6/22/07",
-
DOI = "doi:10.1145/3520304.3528899",
-
abstract = "When performing symbolic regression using genetic
programming, overfitting and bloat can negatively
impact generalizability and interpretability of the
resulting equations as well as increase computation
times. A Bayesian fitness metric is introduced and its
impact on bloat and overfitting during population
evolution is studied and compared to common
alternatives in the literature. The proposed approach
was found to be more robust to noise and data sparsity
in numerical experiments, guiding evolution to a level
of complexity appropriate to the dataset. Further
evolution of the population resulted not in overfitting
or bloat, but rather in slight simplifications in model
form. The ability to identify an equation of complexity
appropriate to the scale of noise in the training data
was also demonstrated. In general, the Bayesian model
selection algorithm was shown to be an effective means
of regularization which resulted in less bloat and
overfitting when any amount of noise was present in the
training data.The efficacy of a Genetic Programming
(GP) [1] solution is often characterized by its (1)
fitness, i.e. ability to perform a training task, (2)
complexity, and (3) generalizability, i.e. ability to
perform its task in an unseen scenario. Bloat is a
common phenomenon for GP in which continued training
results in significant increases in complexity with
minimal improvements in fitness. There are several
theories for the prevalence of bloat in GP which
postulate possible evolutionary benefits of bloat [2];
however, for most practical purposes bloat is a
hindrance rather than a benefit. For example, bloated
solutions are less interpretable and more
computationally expensive. Overfitting is another
common phenomena in GP and the broader machine learning
field. Overfitting occurs when continued training
results in better fitness but reduced
generalizability.",
-
notes = "GECCO-2022 A Recombination of the 31st International
Conference on Genetic Algorithms (ICGA) and the 27th
Annual Genetic Programming Conference (GP)",
- }
Genetic Programming entries for
Geoffrey F Bomarito
Patrick E Leser
Nolan Craig McGee Strauss
Karl Michael Garbrecht
Jacob Dean Hochhalter
Citations