Skip to main content

Advertisement

Log in

A global MINLP approach to symbolic regression

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

Symbolic regression methods generate expression trees that simultaneously define the functional form of a regression model and the regression parameter values. As a result, the regression problem can search many nonlinear functional forms using only the specification of simple mathematical operators such as addition, subtraction, multiplication, and division, among others. Currently, state-of-the-art symbolic regression methods leverage genetic algorithms and adaptive programming techniques. Genetic algorithms lack optimality certifications and are typically stochastic in nature. In contrast, we propose an optimization formulation for the rigorous deterministic optimization of the symbolic regression problem. We present a mixed-integer nonlinear programming (MINLP) formulation to solve the symbolic regression problem as well as several alternative models to eliminate redundancies and symmetries. We demonstrate this symbolic regression technique using an array of experiments based upon literature instances. We then use a set of 24 MINLPs from symbolic regression to compare the performance of five local and five global MINLP solvers. Finally, we use larger instances to demonstrate that a portfolio of models provides an effective solution mechanism for problems of the size typically addressed in the symbolic regression literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  2. Astarabadi, S.S.M., Ebadzadeh, M.M.: A decomposition method for symbolic regression problems. Appl. Soft Comput. 62, 514–523 (2018)

    Article  Google Scholar 

  3. Austel, V., Dash, S., Gunluk, O., Horesh, L., Liberti, L., Nannicini, G., Schieber, B.: Globally optimal symbolic regression. https://arxiv.org/abs/1710.10720 (2017)

  4. Balasubramaniam, P., Kumar, A.V.A.: Solution of matrix Riccati differential equation for nonlinear singular system using genetic programming. Genet. Program. Evol. Mach. 10(1), 71–89 (2008)

    Article  Google Scholar 

  5. Belotti, P., Lee, J., Liberti, L., Margot, F., Wächter, A.: Branching and bounds tightening techniques for non-convex MINLP. Optim. Methods Softw. 24, 597–634 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  6. Berthold, T., Gamrath, G., Hendel, G., Heinz, S., Koch, T., Pfetsch, M., Vigerske, S., Waniek, R., Winkler, M., Wolter, K.: SCIP 3.2, User’s Manual. Zuse Institute, Berlin, Germany (2016)

  7. Bettenhausen, K.D., Marenbach, P., Freyer, S., Rettenmaier, H., Nieken, U.: Self-organizing structured modelling of a biotechnological fed-batch fermentation by means of genetic programming. In: First International Conference on (Conf. Publ. No. 414) Genetic Algorithms in Engineering Systems: Innovations and Applications, 1995. GALESIA, pp. 481–486 (1995)

  8. Bonami, P., Biegler, L.T., Conn, A.R., Cornuejols, G., Grossmann, I.E., Laird, C.D., Lee, J., Lodi, A., Margot, F., Sawaya, N., Wächter, A.: An algorithmic framework for convex mixed integer nonlinear programs. Discrete Optim. 5, 186–204 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  9. Byrd, R.H., Nocedal, J., Waltz, R.A.: KNITRO: an integrated package for nonlinear optimization. In: Di Pillo, G., Roma, M. (eds.) Large-Scale Nonlinear Optimization, pp. 35–59. Springer, Boston (2006)

    Chapter  Google Scholar 

  10. Chen, C., Luoa, C., Jiang, Z.: Block building programming for symbolic regression. Neurocomputing 275, 1973–1980 (2018)

    Article  Google Scholar 

  11. Chen, S.H.: Genetic Algorithms and Genetic Programming in Computational Finance. Springer, New York, NY (2002)

    Book  Google Scholar 

  12. Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimization. Society for Industrial and Applied Mathematics, Philadelphia, PA (2009)

    Book  MATH  Google Scholar 

  13. Cozad, A.: Data- and theory-driven techniques for surrogate-based optimization. Ph.D. thesis, Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA (2014)

  14. Cozad, A., Sahinidis, N.V., Miller, D.C.: Learning surrogate models for simulation-based optimization. AIChE J. 60, 2211–2227 (2014)

    Article  Google Scholar 

  15. Dubčáková, R.: Eureqa: software review. Genet. Program. Evol. Mach. 12, 173–178 (2011)

    Article  Google Scholar 

  16. Duran, M.A., Grossmann, I.E.: An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Program. 36, 307–339 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  17. GAMS/SBB. User’s Manual. https://www.gams.com/latest/docs/S_SBB.html. Accessed 8 May 2018

  18. Grossmann, I.E.: Review of nonlinear mixed-integer and disjunctive programming techniques. Optim. Eng. 3, 227–252 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  19. Keane, M.A., Koza, J.R., Rice, J.P.: Finding an impulse response function using genetic programming. IEEE Am. Control Conf. 1, 2345–2350 (1993)

    Google Scholar 

  20. Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E., Poli, R., Costa, E. (eds.) Genetic Programming, pp. 70–82. Springer, Berlin (2003)

  21. Kishore, J.K., Patnaik, L.M., Mani, V., Agrawal, V.K.: Application of genetic programming for multicategory pattern classification. IEEE Trans. Evolut. Comput. 4(3), 242–258 (2000)

    Article  Google Scholar 

  22. Korns, M.F.: Accuracy in symbolic regression. In: Riolo, R., Vladislavleva, E., Moore, J.H. (eds.) Genetic Programming Theory and Practice IX, pp. 129–151. Springer, Berlin (2011)

  23. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA (1992)

    MATH  Google Scholar 

  24. Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge, MA (1994)

    MATH  Google Scholar 

  25. Lin, Y., Schrage, L.: The global solver in the LINDO API. Optim. Methods Softw. 24, 657–668 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  26. McDermott, J., O’Reilly, U.-M., Luke, S., White, D.: Problem Classification. http://www.gpbenchmarks.org/wiki/ (2014). Accessed 8 May 2018

  27. McKay, B., Willis, M., Barton, G.: Steady-state modelling of chemical process systems using genetic programming. Comput. Chem. Eng. 21, 981–996 (1997)

    Article  Google Scholar 

  28. McKay, B., Willis, M., Searson, D., Montague, G.: Non-linear continuum regression using genetic programming. GECCO 2, 1106–1111 (1999)

    Google Scholar 

  29. Misener, R., Floudas, ChA: ANTIGONE: algorithms for continuous/integer global optimization of nonlinear equations. J. Glob. Optim. 59, 503–526 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  30. Schmidt, M., Lipson, H.: Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009)

    Article  Google Scholar 

  31. Smits, G.F., Kotanchek, M.: Pareto-front exploitation in symbolic regression. In: O’Reilly, U.-M., Yu, T., Riolo, R., Worzel, B. (eds.) Genetic Programming Theory and Practice II, pp. 283–299. Springer, Berlin (2005)

  32. Stoica, P., Selén, Y.: Model-order selection: a review of information criterion rules. IEEE Signal Process. Mag. 21, 36–47 (2004)

    Article  Google Scholar 

  33. Symbolic regression problems. http://minlp.com/nlp-and-minlp-test-problems. Accessed 8 May 2018

  34. Tawarmalani, M., Sahinidis, N.V.: A polyhedral branch-and-cut approach to global optimization. Math. Program. 103, 225–249 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  35. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  36. Uy, N.Q., Hoai, N.X., O’Neill, M., McKay, R.I., Galván-López, E.: Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evol. Mach. 12, 91–119 (2011)

    Article  Google Scholar 

  37. Watson, A.H., Parmee, I.C.: Identification of fluid systems using genetic programming. In: Proceedings of the Second Online Workshop on Evolutionary Computation, pp. 45–48 (1996)

  38. Westerlund, T., Pörn, R.: Solving pseudo-convex mixed integer optimization problems by cutting plane techniques. Optim. Eng. 3, 253–280 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  39. White, D.R., McDermott, J., Castelli, M., Manzoni, L., Goldman, B.W., Kronberger, G., Jaśkowski, W., O’Reilly, U.-M., Luke, S.: Better GP benchmarks: community survey results and proposals. Genet. Program. Evol. Mach. 14, 3–29 (2013)

    Article  Google Scholar 

  40. Willis, M.J., Hiden, H.G., Marenbach, P., McKay, B., Montague, G.A.: Genetic programming: an introduction and survey of applications. IEEE Conf. Publ. 1, 314–319 (1997)

    Google Scholar 

Download references

Acknowledgements

As part of the National Energy Technology Laboratory’s Regional University Alliance (NETL-RUA), a collaborative initiative of the NETL, this technical effort was performed under the RES contract DE-FE0004000, as part of the Carbon Capture Simulation Initiative.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolaos V. Sahinidis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cozad, A., Sahinidis, N.V. A global MINLP approach to symbolic regression. Math. Program. 170, 97–119 (2018). https://doi.org/10.1007/s10107-018-1289-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-018-1289-x

Keywords

Mathematics Subject Classification

Navigation