Abstract
Symbolic regression methods generate expression trees that simultaneously define the functional form of a regression model and the regression parameter values. As a result, the regression problem can search many nonlinear functional forms using only the specification of simple mathematical operators such as addition, subtraction, multiplication, and division, among others. Currently, state-of-the-art symbolic regression methods leverage genetic algorithms and adaptive programming techniques. Genetic algorithms lack optimality certifications and are typically stochastic in nature. In contrast, we propose an optimization formulation for the rigorous deterministic optimization of the symbolic regression problem. We present a mixed-integer nonlinear programming (MINLP) formulation to solve the symbolic regression problem as well as several alternative models to eliminate redundancies and symmetries. We demonstrate this symbolic regression technique using an array of experiments based upon literature instances. We then use a set of 24 MINLPs from symbolic regression to compare the performance of five local and five global MINLP solvers. Finally, we use larger instances to demonstrate that a portfolio of models provides an effective solution mechanism for problems of the size typically addressed in the symbolic regression literature.
Similar content being viewed by others
References
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723 (1974)
Astarabadi, S.S.M., Ebadzadeh, M.M.: A decomposition method for symbolic regression problems. Appl. Soft Comput. 62, 514–523 (2018)
Austel, V., Dash, S., Gunluk, O., Horesh, L., Liberti, L., Nannicini, G., Schieber, B.: Globally optimal symbolic regression. https://arxiv.org/abs/1710.10720 (2017)
Balasubramaniam, P., Kumar, A.V.A.: Solution of matrix Riccati differential equation for nonlinear singular system using genetic programming. Genet. Program. Evol. Mach. 10(1), 71–89 (2008)
Belotti, P., Lee, J., Liberti, L., Margot, F., Wächter, A.: Branching and bounds tightening techniques for non-convex MINLP. Optim. Methods Softw. 24, 597–634 (2009)
Berthold, T., Gamrath, G., Hendel, G., Heinz, S., Koch, T., Pfetsch, M., Vigerske, S., Waniek, R., Winkler, M., Wolter, K.: SCIP 3.2, User’s Manual. Zuse Institute, Berlin, Germany (2016)
Bettenhausen, K.D., Marenbach, P., Freyer, S., Rettenmaier, H., Nieken, U.: Self-organizing structured modelling of a biotechnological fed-batch fermentation by means of genetic programming. In: First International Conference on (Conf. Publ. No. 414) Genetic Algorithms in Engineering Systems: Innovations and Applications, 1995. GALESIA, pp. 481–486 (1995)
Bonami, P., Biegler, L.T., Conn, A.R., Cornuejols, G., Grossmann, I.E., Laird, C.D., Lee, J., Lodi, A., Margot, F., Sawaya, N., Wächter, A.: An algorithmic framework for convex mixed integer nonlinear programs. Discrete Optim. 5, 186–204 (2008)
Byrd, R.H., Nocedal, J., Waltz, R.A.: KNITRO: an integrated package for nonlinear optimization. In: Di Pillo, G., Roma, M. (eds.) Large-Scale Nonlinear Optimization, pp. 35–59. Springer, Boston (2006)
Chen, C., Luoa, C., Jiang, Z.: Block building programming for symbolic regression. Neurocomputing 275, 1973–1980 (2018)
Chen, S.H.: Genetic Algorithms and Genetic Programming in Computational Finance. Springer, New York, NY (2002)
Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimization. Society for Industrial and Applied Mathematics, Philadelphia, PA (2009)
Cozad, A.: Data- and theory-driven techniques for surrogate-based optimization. Ph.D. thesis, Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA (2014)
Cozad, A., Sahinidis, N.V., Miller, D.C.: Learning surrogate models for simulation-based optimization. AIChE J. 60, 2211–2227 (2014)
Dubčáková, R.: Eureqa: software review. Genet. Program. Evol. Mach. 12, 173–178 (2011)
Duran, M.A., Grossmann, I.E.: An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Program. 36, 307–339 (1986)
GAMS/SBB. User’s Manual. https://www.gams.com/latest/docs/S_SBB.html. Accessed 8 May 2018
Grossmann, I.E.: Review of nonlinear mixed-integer and disjunctive programming techniques. Optim. Eng. 3, 227–252 (2002)
Keane, M.A., Koza, J.R., Rice, J.P.: Finding an impulse response function using genetic programming. IEEE Am. Control Conf. 1, 2345–2350 (1993)
Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E., Poli, R., Costa, E. (eds.) Genetic Programming, pp. 70–82. Springer, Berlin (2003)
Kishore, J.K., Patnaik, L.M., Mani, V., Agrawal, V.K.: Application of genetic programming for multicategory pattern classification. IEEE Trans. Evolut. Comput. 4(3), 242–258 (2000)
Korns, M.F.: Accuracy in symbolic regression. In: Riolo, R., Vladislavleva, E., Moore, J.H. (eds.) Genetic Programming Theory and Practice IX, pp. 129–151. Springer, Berlin (2011)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA (1992)
Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge, MA (1994)
Lin, Y., Schrage, L.: The global solver in the LINDO API. Optim. Methods Softw. 24, 657–668 (2009)
McDermott, J., O’Reilly, U.-M., Luke, S., White, D.: Problem Classification. http://www.gpbenchmarks.org/wiki/ (2014). Accessed 8 May 2018
McKay, B., Willis, M., Barton, G.: Steady-state modelling of chemical process systems using genetic programming. Comput. Chem. Eng. 21, 981–996 (1997)
McKay, B., Willis, M., Searson, D., Montague, G.: Non-linear continuum regression using genetic programming. GECCO 2, 1106–1111 (1999)
Misener, R., Floudas, ChA: ANTIGONE: algorithms for continuous/integer global optimization of nonlinear equations. J. Glob. Optim. 59, 503–526 (2014)
Schmidt, M., Lipson, H.: Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009)
Smits, G.F., Kotanchek, M.: Pareto-front exploitation in symbolic regression. In: O’Reilly, U.-M., Yu, T., Riolo, R., Worzel, B. (eds.) Genetic Programming Theory and Practice II, pp. 283–299. Springer, Berlin (2005)
Stoica, P., Selén, Y.: Model-order selection: a review of information criterion rules. IEEE Signal Process. Mag. 21, 36–47 (2004)
Symbolic regression problems. http://minlp.com/nlp-and-minlp-test-problems. Accessed 8 May 2018
Tawarmalani, M., Sahinidis, N.V.: A polyhedral branch-and-cut approach to global optimization. Math. Program. 103, 225–249 (2005)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)
Uy, N.Q., Hoai, N.X., O’Neill, M., McKay, R.I., Galván-López, E.: Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evol. Mach. 12, 91–119 (2011)
Watson, A.H., Parmee, I.C.: Identification of fluid systems using genetic programming. In: Proceedings of the Second Online Workshop on Evolutionary Computation, pp. 45–48 (1996)
Westerlund, T., Pörn, R.: Solving pseudo-convex mixed integer optimization problems by cutting plane techniques. Optim. Eng. 3, 253–280 (2002)
White, D.R., McDermott, J., Castelli, M., Manzoni, L., Goldman, B.W., Kronberger, G., Jaśkowski, W., O’Reilly, U.-M., Luke, S.: Better GP benchmarks: community survey results and proposals. Genet. Program. Evol. Mach. 14, 3–29 (2013)
Willis, M.J., Hiden, H.G., Marenbach, P., McKay, B., Montague, G.A.: Genetic programming: an introduction and survey of applications. IEEE Conf. Publ. 1, 314–319 (1997)
Acknowledgements
As part of the National Energy Technology Laboratory’s Regional University Alliance (NETL-RUA), a collaborative initiative of the NETL, this technical effort was performed under the RES contract DE-FE0004000, as part of the Carbon Capture Simulation Initiative.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cozad, A., Sahinidis, N.V. A global MINLP approach to symbolic regression. Math. Program. 170, 97–119 (2018). https://doi.org/10.1007/s10107-018-1289-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-018-1289-x