Symbolic Regression via Control Variable Genetic Programming

Jiang, Nan; Xue, Yexiang

doi:10.1007/978-3-031-43421-1_11

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14172))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

922 Accesses

Abstract

Learning symbolic expressions directly from experiment data is a vital step in AI-driven scientific discovery. Nevertheless, state-of-the-art approaches are limited to learning simple expressions. Regressing expressions involving many independent variables still remain out of reach. Motivated by the control variable experiments widely utilized in science, we propose Control Variable Genetic Programming (CVGP) for symbolic regression over many independent variables. CVGP expedites symbolic expression discovery via customized experiment design, rather than learning from a fixed dataset collected a priori. CVGP starts by fitting simple expressions involving a small set of independent variables using genetic programming, under controlled experiments where other variables are held as constants. It then extends expressions learned in previous generations by adding new independent variables, using new control variable experiments in which these variables are allowed to vary. Theoretically, we show CVGP as an incremental building approach can yield an exponential reduction in the search space when learning a class of expressions. Experimentally, CVGP outperforms several baselines in learning symbolic expressions involving multiple independent variables.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The code is at: https://github.com/jiangnanhugo/cvgp/. Please refer to the extended version (https://arxiv.org/abs/2306.08057) for the Appendix.

References

Abolafia, D.A., Norouzi, M., Le, Q.V.: Neural program synthesis with priority queue training. CoRR abs/1801.03526 (2018)
Google Scholar
Anthony, T., Tian, Z., Barber, D.: Thinking fast and slow with deep learning and tree search. In: NIPS, pp. 5360–5370 (2017)
Google Scholar
Balcan, M., Dick, T., Sandholm, T., Vitercik, E.: Learning to branch. In: ICML. Proceedings of Machine Learning Research, vol. 80, pp. 353–362. PMLR (2018)
Google Scholar
Biggio, L., Bendinelli, T., Neitz, A., Lucchi, A., Parascandolo, G.: Neural symbolic regression that scales. In: ICML. Proceedings of Machine Learning Research, vol. 139, pp. 936–945. PMLR (2021)
Google Scholar
Booch, G., et al.: Thinking fast and slow in AI. In: AAAI, pp. 15042–15046. AAAI Press (2021)
Google Scholar
Bradley, E., Easley, M., Stolle, R.: Reasoning about nonlinear system identification. Artif. Intell. 133(1), 139–188 (2001)
Article MATH Google Scholar
Bridewell, W., Langley, P., Todorovski, L., Džeroski, S.: Inductive process modeling. Mach. Learn. 71, 1–32 (2008)
Article MATH Google Scholar
Brunton, S.L., Proctor, J.L., Kutz, J.N.: Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. 113(15), 3932–3937 (2016)
Article MathSciNet MATH Google Scholar
Cerrato, M., Brugger, J., Schmitt, N., Kramer, S.: Reinforcement learning for automated scientific discovery. In: AAAI Spring Symposium on Computational Approaches to Scientific Discovery (2023)
Google Scholar
Chen, C., Luo, C., Jiang, Z.: Elite bases regression: a real-time algorithm for symbolic regression. In: ICNC-FSKD, pp. 529–535. IEEE (2017)
Google Scholar
Chen, D., Wang, Y., Gao, W.: Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning. Appl. Intell. 50(10), 3301–3317 (2020)
Article Google Scholar
Chen, Q., Xue, B., Zhang, M.: Rademacher complexity for enhancing the generalization of genetic programming for symbolic regression. IEEE Trans. Cybern. 52(4), 2382–2395 (2022)
Article Google Scholar
Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary differential equations. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Cranmer, M.D., et al.: Discovering symbolic models from deep learning with inductive biases. In: NeurIPS (2020)
Google Scholar
Dubcáková, R.: Eureqa: software review. Genet. Program Evolvable Mach. 12(2), 173–178 (2011)
Article Google Scholar
Dzeroski, S., Todorovski, L.: Discovering dynamics: from inductive logic programming to machine discovery. J. Intell. Inf. Syst. 4(1), 89–108 (1995)
Article Google Scholar
Fortin, F.A., De Rainville, F.M., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)
Google Scholar
Glymour, C., Scheines, R., Spirtes, P.: Discovering Causal Structure: Artificial Intelligence, Philosophy of Science, and Statistical Modeling. Academic Press, London (2014)
Google Scholar
Golovin, D., Krause, A., Ray, D.: Near-optimal Bayesian active learning with noisy observations. In: Advances in Neural Information Processing Systems, vol. 23 (2010)
Google Scholar
Guimerà, R., et al.: A Bayesian machine scientist to aid in the solution of challenging scientific problems. Sci. Adv. 6(5), eaav6971 (2020)
Google Scholar
Hanneke, S.: Theory of disagreement-based active learning. Found. Trends Mach. Learn. 7(2–3), 131–309 (2014)
Article MATH Google Scholar
He, B., Lu, Q., Yang, Q., Luo, J., Wang, Z.: Taylor genetic programming for symbolic regression. In: GECCO, pp. 946–954. ACM (2022)
Google Scholar
Iten, R., Metger, T., Wilming, H., Del Rio, L., Renner, R.: Discovering physical concepts with neural networks. Phys. Rev. Lett. 124(1), 010508 (2020)
Article Google Scholar
Jaber, A., Ribeiro, A., Zhang, J., Bareinboim, E.: Causal identification under Markov equivalence: calculus, algorithm, and completeness. Adv. Neural. Inf. Process. Syst. 35, 3679–3690 (2022)
Google Scholar
Kahneman, D.: Thinking, Fast and Slow. Macmillan, New York (2011)
Google Scholar
Kamienny, P., d’Ascoli, S., Lample, G., Charton, F.: End-to-end symbolic regression with transformers. In: NeurIPS (2022)
Google Scholar
Kibler, D.F., Langley, P.: The experimental study of machine learning (1991)
Google Scholar
King, R.D., et al.: The automation of science. Science 324(5923), 85–89 (2009)
Article Google Scholar
King, R.D., et al.: Functional genomic hypothesis generation and experimentation by a robot scientist. Nature 427(6971), 247–252 (2004)
Article Google Scholar
La Cava, W., et al.: Contemporary symbolic regression methods and their relative performance. arXiv preprint arXiv:2107.14351 (2021)
Langley, P.: BACON: a production system that discovers empirical laws. In: IJCAI, p. 344. William Kaufmann (1977)
Google Scholar
Langley, P.: Rediscovering physics with BACON.3. In: IJCAI, pp. 505–507. William Kaufmann (1979)
Google Scholar
Langley, P.: Data-driven discovery of physical laws. Cogn. Sci. 5(1), 31–54 (1981)
Article Google Scholar
Langley, P.: Machine learning as an experimental science. Mach. Learn. 3, 5–8 (1988)
Article Google Scholar
Langley, P.: Scientific discovery, causal explanation, and process model induction. Mind Soc. 18(1), 43–56 (2019)
Article Google Scholar
Langley, P., Bradshaw, G.L., Simon, H.A.: BACON.5: the discovery of conservation laws. In: IJCAI, pp. 121–126. William Kaufmann (1981)
Google Scholar
Langley, P.W., Simon, H.A., Bradshaw, G., Zytkow, J.M.: Scientific Discovery: Computational Explorations of the Creative Process. The MIT Press, Cambridge (1987)
Google Scholar
Lehman, J.S., Santner, T.J., Notz, W.I.: Designing computer experiments to determine robust control variables. Statistica Sinica, 571–590 (2004)
Google Scholar
Lenat, D.B.: The ubiquity of discovery. Artif. Intell. 9(3), 257–285 (1977)
Article Google Scholar
Liu, Z., Tegmark, M.: Machine learning conservation laws from trajectories. Phys. Rev. Lett. 126, 180604 (2021)
Google Scholar
Matsubara, Y., Chiba, N., Igarashi, R., Taniai, T., Ushiku, Y.: Rethinking symbolic regression datasets and benchmarks for scientific discovery. arXiv preprint arXiv:2206.10540 (2022)
McConaghy, T.: FFX: fast, scalable, deterministic symbolic regression technology. In: Riolo, R., Vladislavleva, E., Moore, J. (eds.) Genetic Programming Theory and Practice IX. Genetic and Evolutionary Computation, pp. 235–260. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-1770-5_13
Mundhenk, T.N., Landajuela, M., Glatt, R., Santiago, C.P., Faissol, D.M., Petersen, B.K.: Symbolic regression via deep reinforcement learning enhanced genetic programming seeding. In: NeurIPS, pp. 24912–24923 (2021)
Google Scholar
Pearl, J.: Causality. Cambridge University Press, Cambridge (2009)
Google Scholar
Petersen, B.K., Landajuela, M., Mundhenk, T.N., Santiago, C.P., Kim, S., Kim, J.T.: Deep symbolic regression: recovering mathematical expressions from data via risk-seeking policy gradients. In: ICLR. OpenReview.net (2021)
Google Scholar
Raissi, M., Perdikaris, P., Karniadakis, G.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)
Article MathSciNet MATH Google Scholar
Raissi, M., Yazdani, A., Karniadakis, G.E.: Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367(6481), 1026–1030 (2020)
Article MathSciNet MATH Google Scholar
Razavi, S., Gamazon, E.R.: Neural-network-directed genetic programmer for discovery of governing equations. CoRR abs/2203.08808 (2022)
Google Scholar
Ryan, T.P., Morgan, J.P.: Modern experimental design. J. Stat. Theory Pract. 1(3–4), 501–506 (2007)
Article MATH Google Scholar
Santner, T.J., Williams, B.J., Notz, W.I.: The Design and Analysis of Computer Experiments. Springer Series in Statistics, Springer, New York (2003). https://doi.org/10.1007/978-1-4757-3799-8
Book MATH Google Scholar
Scavuzzo, L., et al.: Learning to branch with tree MDPs. In: NeurIPS (2022)
Google Scholar
Schmidt, M., Lipson, H.: Distilling free-form natural laws from experimental data. Science 324(5923), 81–85 (2009)
Article Google Scholar
Simon, H.A.: Spurious correlation: a causal interpretation. J. Am. Stat. Assoc. 49(267), 467–479 (1954)
MATH Google Scholar
Udrescu, S.M., Tegmark, M.: AI Feynman: a physics-inspired method for symbolic regression. Sci. Adv. 6(16) (2020)
Google Scholar
Uy, N.Q., Hoai, N.X., O’Neill, M., McKay, R.I., López, E.G.: Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program Evolvable Mach. 12(2), 91–119 (2011)
Article Google Scholar
Valdés-Pérez, R.: Human/computer interactive elucidation of reaction mechanisms: application to catalyzed hydrogenolysis of ethane. Catal. Lett. 28, 79–87 (1994)
Article Google Scholar
Virgolin, M., Alderliesten, T., Bosman, P.A.N.: Linear scaling with and within semantic backpropagation-based genetic programming for symbolic regression. In: GECCO, pp. 1084–1092. ACM (2019)
Google Scholar
Virgolin, M., Pissis, S.P.: Symbolic regression is NP-hard. Trans. Mach. Learn. Res. (2022)
Google Scholar
Wang, H., et al.: Enabling scientific discovery with artificial intelligence. Nature (2022)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)
Article MATH Google Scholar
Wu, T., Tegmark, M.: Toward an artificial intelligence physicist for unsupervised learning. Phys. Rev. E 100, 033311 (Sep2019)
Google Scholar
Xue, Y., Nasim, Md., Zhang, M., Fan, C., Zhang, X., El-Azab, A.: Physics knowledge discovery via neural differential equation embedding. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12979, pp. 118–134. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86517-7_8
Chapter Google Scholar
Zhang, S., Lin, G.: Robust data-driven discovery of governing physical laws with error bars. Proc. Roy. Soc. A Math. Phys. Eng. Sci. 474(2217), 20180305 (2018)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

We thank all the reviewers for their constructive comments. This research was supported by NSF grant CCF-1918327.

Author information

Authors and Affiliations

Department of Computer Science, Purdue University, West Lafayette, IN, USA
Nan Jiang & Yexiang Xue

Authors

Nan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yexiang Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nan Jiang .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Danai Koutra
University of Vienna, Vienna, Austria
Claudia Plant
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Manuel Gomez Rodriguez
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, N., Xue, Y. (2023). Symbolic Regression via Control Variable Genetic Programming. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14172. Springer, Cham. https://doi.org/10.1007/978-3-031-43421-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-43421-1_11
Published: 18 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43420-4
Online ISBN: 978-3-031-43421-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Symbolic Regression via Control Variable Genetic Programming