ABSTRACT
Symbolic Regression (SR) is the task of finding closed-form analytical expressions that describe the relationship between variables in a dataset. In this work, werethink SR and introduce mechanisms from two perspectives: morphology and adaptability. Morphology: Man-made heuristics are typically utilized in SR algorithms to influence the morphology (or structure) of candidate expressions, potentially introducing unintentional bias and data leakage. To address this issue, we create a depth-aware mathematical language model trained on terminal walks of expression trees, as a replacement to these heuristics. Adaptability: We promote alternating fitness functions across generations, eliminating equations that perform well in only one fitness function and as a result, discover expressions that are closer to the true functional form. We demonstrate this by alternating fitness functions that quantify faithfulness to values (via MSE) and empirical derivatives (via a novel theoretically justified fitness metric coined MSEDI). Proof-of-concept: We combine these ideas into a minimalistic evolutionary SR algorithm that outperforms a suite of benchmark and state of-the-art SR algorithms in problems with unknown constants added, which we claim are more reflective of SR performance for real-world applications. Our claim is then strengthened by reproducing the superior performance on real-world regression datasets from SRBench. This Hot-of-the-Press paper summarizes the work K.S. Fong, S. Wongso and M. Motani, "Rethinking Symbolic Regression: Morphology and Adaptability in the Context of Evolutionary Algorithms", The Eleventh International Conference on Learning International Conference on Learning Representations (ICLR'23).
- John R Koza. Genetic programming. on the programming of computers by means of natural selection. Complex adaptive systems, 1992.Google Scholar
- Kei Sen Fong, et al. Rethinking symbolic regression: Morphology and adaptability in the context of evolutionary algorithms. In International Conference on Learning Representations, 2023.Google Scholar
- Tony Worm and Kenneth Chiu. Prioritized grammar enumeration: symbolic regression by dynamic programming. In Annual Conference on Genetic and Evolutionary Computation, pages 1021--1028, 2013.Google ScholarDigital Library
- Brenden K Petersen et al. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. In International Conference on Learning Representations, 2021.Google Scholar
- Iwo Bladek and Krzysztof Krawiec. Solving symbolic regression problems with formal constraints. In Genetic and Evolutionary Computation Conference, pages 977--984, 2019.Google ScholarDigital Library
- Mengjie Zhang, et al. Online program simplification in genetic programming. In Asia-Pacific Conference on Simulated Evolution and Learning, pages 592--600. Springer, 2006.Google ScholarDigital Library
- Christian Loftis, et al. Lattice thermal conductivity prediction using symbolic regression and machine learning. The Journal of Physical Chemistry A, 125(1):435--450, 2020.Google ScholarCross Ref
- Michael F Korns. A baseline symbolic regression algorithm. In Genetic Programming Theory and Practice X, pages 117--137. Springer, 2013.Google ScholarCross Ref
- Patryk Orzechowski, et al. Where are we now? a large benchmark study of recent symbolic regression methods. In Genetic and Evolutionary Computation Conference, pages 1183--1190, 2018.Google ScholarDigital Library
- Terrell Mundhenk et al. Symbolic regression via deep reinforcement learning enhanced genetic programming seeding. Advances in Neural Information Processing Systems, 34:24912--24923, 2021.Google Scholar
- Ying Jin, et al. Bayesian symbolic regression. arXiv:1910.08892, 2019.Google Scholar
- Nguyen Quang Uy et al. Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genetic Programming and Evolvable Machines, 12(2):91--119, 2011.Google ScholarDigital Library
- Krzysztof Krawiec and Tomasz Pawlak. Approximating geometric crossover by semantic backpropagation. In Annual conference on Genetic and evolutionary computation, pages 941--948, 2013.Google ScholarDigital Library
- Michael Schmidt and Hod Lipson. Symbolic regression of implicit equations. In Genetic Programming Theory and Practice VII, pages 73--85. Springer, 2010.Google ScholarCross Ref
- Michael Schmidt and Hod Lipson. Distilling free-form natural laws from experimental data. Science, 324(5923):81--85, 2009.Google ScholarCross Ref
- Patrick Bateson. Adaptability and evolution. Interface Focus, 7(5):20160126, 2017.Google ScholarCross Ref
- William La Cava et al. Contemporary symbolic regression methods and their relative performance. arXiv:2107.14351, 2021.Google Scholar
Index Terms
- Evolutionary Symbolic Regression: Mechanisms from the Perspectives of Morphology and Adaptability
Recommendations
Scaled Symbolic Regression
Performing a linear regression on the outputs of arbitrary symbolic expressions has empirically been found to provide great benefits. Here some basic theoretical results of linear regression are reviewed on their applicability for use in symbolic ...
Symbolic and numerical regression: experiments and applications
Special issue on recent advances in soft computingThis paper describes a new method for creating polynomial regression models. The new method is compared with stepwise regression and symbolic regression using three example problems. The first example is a polynomial equation. The two examples that ...
Hoeffding bound based evolutionary algorithm for symbolic regression
In symbolic regression area, it is difficult for evolutionary algorithms to construct a regression model when the number of sample points is very large. Much time will be spent in calculating the fitness of the individuals and in selecting the best ...
Comments