research-article

Symbolic regression driven by training data and prior knowledge

Authors:
Jiří Kubalík

Czech Technical University in Prague, Prague, Czech Republic

Czech Technical University in Prague, Prague, Czech Republic
View Profile

,
Erik Derner

Czech Technical University in Prague, Prague, Czech Republic

Czech Technical University in Prague, Prague, Czech Republic
View Profile

,
Robert Babuška

Delft University of Technology, Delft, The Netherlands and Czech Technical University in Prague, Prague, Czech Republic

Delft University of Technology, Delft, The Netherlands and Czech Technical University in Prague, Prague, Czech Republic
View Profile

GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation ConferenceJune 2020Pages 958–966https://doi.org/10.1145/3377930.3390152

Published:26 June 2020Publication History

GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference

Pages 958–966

ABSTRACT

In symbolic regression, the search for analytic models is typically driven purely by the prediction error observed on the training data samples. However, when the data samples do not sufficiently cover the input space, the prediction error does not provide sufficient guidance toward desired models. Standard symbolic regression techniques then yield models that are partially incorrect, for instance, in terms of their steady-state characteristics or local behavior. If these properties were considered already during the search process, more accurate and relevant models could be produced. We propose a multi-objective symbolic regression approach that is driven by both the training data and the prior knowledge of the properties the desired model should manifest. The properties given in the form of formal constraints are internally represented by a set of discrete data samples on which candidate models are exactly checked. The proposed approach was experimentally evaluated on three test problems with results clearly demonstrating its capability to evolve realistic models that fit the training data well while complying with the prior knowledge of the desired model characteristics at the same time. It outperforms standard symbolic regression by several orders of magnitude in terms of the mean squared deviation from a reference model.

References

Alibekov, E., Kubalík, J., and Babuska, R. Policy derivation methods for critic-only reinforcement learning in continuous spaces. Eng. Appl. of AI 69 (2018), 178--187.Google ScholarCross Ref
Alibekov, E., Kubalík, J., and Babuška, R. Symbolic method for deriving policy in reinforcement learning. In 2016 IEEE 55th Conference on Decision and Control (CDC) (Dec 2016), pp. 2789--2795.Google ScholarDigital Library
Arnaldo, I., Krawiec, K., and O'Reilly, U.-M. Multiple regression genetic programming. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation (New York, NY, USA, 2014), GECCO '14, Association for Computing Machinery, p. 879--886.Google Scholar
Arnaldo, I., O'Reilly, U.-M., and Veeramachaneni, K. Building predictive models via feature synthesis. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation (New York, NY, USA, 2015), GECCO '15, Association for Computing Machinery, p. 983--990.Google Scholar
Babuška, R. Fuzzy Modeling for Control. Kluwer Academic Publishers, Boston, USA, 1998.Google ScholarDigital Library
Błądek, I., and Krawiec, K. Solving symbolic regression problems with formal constraints. In Proceedings of the Genetic and Evolutionary Computation Conference (New York, NY, USA, 2019), GECCO '19, ACM, pp. 977--984.Google ScholarDigital Library
Boedecker, J., Springenberg, J. T., Wülfing, J., and Riedmiller, M. Approximate real-time optimal control based on sparse gaussian process models. In 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) (Dec 2014), pp. 1--8.Google ScholarCross Ref
Damsteeg, J., Nageshrao, S., and Babuška, R. Model-based real-time control of a magnetic manipulator system. In Proceedings 56th IEEE Conference on Decision and Control (CDC) (Melbourne, Australia, Dec. 2017), pp. 3277--3282.Google ScholarCross Ref
de Bruin, T., Kober, J., Tuyls, K., and Babuška, R. Integrating state representation learning into deep reinforcement learning. IEEE Robotics and Automation Letters 3, 3 (July 2018), 1394--1401.Google ScholarCross Ref
Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (April 2002), 182--197.Google ScholarDigital Library
Deisenroth, M. P., and Rasmussen, C. E. PILCO: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28-July 2, 2011 (2011), pp. 465--472.Google Scholar
Derner, E., Kubalík, J., and Babuska, R. Data-driven construction of symbolic process models for reinforcement learning. In Proceedings IEEE International Conference on Robotics and Automation (ICRA) (Brisbane, Australia, May 2018), pp. 5105--5112.Google ScholarCross Ref
Derner, E., Kubalík, J., and Babuska, R. Reinforcement learning with symbolic input-output models. In Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2018), pp. 3004--3009.Google ScholarCross Ref
Grondman, I., Vaandrager, M., Busoniu, L., Babuska, R., and Schuitema, E. Efficient model learning methods for actor-critic control. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, 3 (June 2012), 591--602.Google ScholarDigital Library
Hurak, Z., and Zemanek, J. Feedback linearization approach to distributed feedback manipulation. In American control conference (Montreal, Canada, 2012), pp. 991--996.Google Scholar
Jackson, D. A new, node-focused model for genetic programming. In Proceedings of the 15th European Conference on Genetic Programming (Berlin, Heidelberg, 2012), EuroGP'12, Springer-Verlag, p. 49--60.Google ScholarDigital Library
Krawiec, K., Błądek, I., and Swan, J. Counterexample-driven genetic programming. In Proceedings of the Genetic and Evolutionary Computation Conference (New York, NY, USA, 2017), GECCO '17, ACM, pp. 953--960.Google ScholarDigital Library
Kubalík, J., Alibekov, E., Žegklitz, J., and Babuška, R. Hybrid single node genetic programming for symbolic regression. In Transactions on Computational Collective Intelligence XXIV - Volume 9770 (Berlin, Heidelberg, 2016), Springer-Verlag, p. 61--82.Google ScholarDigital Library
Kubalík., J., Derner., E., and Babuška., R. Enhanced symbolic regression through local variable transformations. In Proceedings of the 9th International Joint Conference on Computational Intelligence - Volume 1: IJCCI, (2017), INSTICC, SciTePress, pp. 91--100.Google Scholar
Levine, S., and Abbeel, P. Learning neural network policies with guided policy search under unknown dynamics. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1 (Cambridge, MA, USA, 2014), NIPS'14, MIT Press, p. 1071--1079.Google Scholar
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning, 2015.Google Scholar
Lioutikov, R., Paraschos, A., Peters, J., and Neumann, G. Sample-based informationl-theoretic stochastic optimal control. In 2014 IEEE International Conference on Robotics and Automation (ICRA) (May 2014), pp. 3896--3902.Google ScholarCross Ref
Schmidt, M., and Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 5923 (2009), 81--85.Google ScholarCross Ref
Searson, D. P. Gptips 2: An open-source software platform for symbolic data mining. In Handbook of Genetic Programming Applications (2014).Google Scholar
Staelens, N., Deschrijver, D., Vladislavleva, E., Vermeulen, B., Dhaene, T., and Demeester, P. Constructing a no-reference h.264/avc bitstream-based video quality metric using genetic programming-based symbolic regression. IEEE Trans. Cir. and Sys. for Video Technol. 23, 8 (Aug. 2013), 1322--1333.Google Scholar
Vladislavleva, E., Friedrich, T., Neumann, F., and Wagner, M. Predicting the energy output of wind farms based on weather data: Important variables and their correlation. Renewable Energy 50 (2013), 236 -- 243.Google ScholarCross Ref

Index Terms

Symbolic regression driven by training data and prior knowledge

Recommendations

Scaled Symbolic Regression

Performing a linear regression on the outputs of arbitrary symbolic expressions has empirically been found to provide great benefits. Here some basic theoretical results of linear regression are reviewed on their applicability for use in symbolic ...
Read More
Symbolic and numerical regression: experiments and applications
Special issue on recent advances in soft computing

This paper describes a new method for creating polynomial regression models. The new method is compared with stepwise regression and symbolic regression using three example problems. The first example is a polynomial equation. The two examples that ...
Read More
Modeling Synthesis Processes of Photocatalysts Using Symbolic Regression α-β
MICAI '14: Proceedings of the 2014 13th Mexican International Conference on Artificial Intelligence

Symbolic regression is an application of genetic programming and is used for modeling different dynamic processes. Industrial processes problems have been solved using this technique. In this work a symbolic regression algorithm is used for modeling the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference
June 2020
1349 pages
ISBN:9781450371285
DOI:10.1145/3377930
General Chair:
Carlos Artemio Coello Coello
CINVESTAV-IPN
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 June 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
genetic programming
model learning
multi-objective optimization
symbolic regression
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 405
  Total Downloads
- Downloads (Last 12 months)86
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Symbolic regression driven by training data and prior knowledge

GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scaled Symbolic Regression

Symbolic and numerical regression: experiments and applications

Modeling Synthesis Processes of Photocatalysts Using Symbolic Regression α-β

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Symbolic regression driven by training data and prior knowledge

GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scaled Symbolic Regression

Symbolic and numerical regression: experiments and applications

Modeling Synthesis Processes of Photocatalysts Using Symbolic Regression α-β

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media