Cluster Analysis of a Symbolic Regression Search Space

Kronberger, Gabriel; Kammerer, Lukas; Burlacu, Bogdan; Winkler, Stephan M.; Kommenda, Michael; Affenzeller, Michael

doi:10.1007/978-3-030-04735-1_5

Cluster Analysis of a Symbolic Regression Search Space

Gabriel Kronberger^6,7,
Lukas Kammerer^6,7,8,
Bogdan Burlacu^6,7,8,
Stephan M. Winkler^6,8,
Michael Kommenda^6,7,8 &
…
Michael Affenzeller^6,8

Chapter
First Online: 24 January 2019

789 Accesses
4 Citations
1 Altmetric

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

In this chapter we take a closer look at the distribution of symbolic regression models generated by genetic programming in the search space. The motivation for this work is to improve the search for well-fitting symbolic regression models by using information about the similarity of models that can be precomputed independently from the target function. For our analysis, we use a restricted grammar for uni-variate symbolic regression models and generate all possible models up to a fixed length limit. We identify unique models and cluster them based on phenotypic as well as genotypic similarity. We find that phenotypic similarity leads to well-defined clusters while genotypic similarity does not produce a clear clustering. By mapping solution candidates visited by GP to the enumerated search space we find that GP initially explores the whole search space and later converges to the subspace of highest quality expressions in a run for a simple benchmark problem.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
We actually found that this assumption is wrong. We found that the search space can be split into clusters of phenotypically and genotypically similar expressions. However, we could not show that phenotypically similar expressions also are phenotypically similar and/or vice versa. This is intuitive because two highly similar expressions become dissimilar on the phenotypic level just by a multiplication with zero. Symmetrically, many different expressions can be found which produce the same output.
2.
https://github.com/elbamos/largeVis.
3.
We have used a uni-variate variant of the benchmark function described by Pagie and Hogeweg.

References

Burke, E.K., Gustafson, S., Kendall, G.: Diversity in genetic programming: An analysis of measures and correlation with fitness. IEEE Transactions on Evolutionary Computation 8(1), 47–62 (2004). https://doi.org/10.1109/TEVC.2003.819263
Article Google Scholar
Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: J. Pei, V.S. Tseng, L. Cao, H. Motoda, G. Xu (eds.) Advances in Knowledge Discovery and Data Mining, pp. 160–172. Springer Berlin Heidelberg, Berlin, Heidelberg (2013)
Chapter Google Scholar
Dasgupta, S., Freund, Y.: Random projection trees and low dimensional manifolds. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing (STOC’08), pp. 537–546. ACM (2008)
Google Scholar
Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: European Conference on Genetic Programming, pp. 70–82. Springer (2003)
Google Scholar
Kommenda, M., Kronberger, G., Winkler, S., Affenzeller, M., Wagner, S.: Effects of constant optimization by nonlinear least squares minimization in symbolic regression. In: Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 1121–1128. ACM (2013)
Google Scholar
Luke, S.: Two fast tree-creation algorithms for genetic programming. IEEE Transactions on Evolutionary Computation 4(3), 274–283 (2000)
Article Google Scholar
Maaten, L.v.d., Hinton, G.: Visualizing data using t-SNE. Journal of Machine Learning Research 9(Nov), 2579–2605 (2008)
Google Scholar
McInnes, L., Healy, J.: Accelerated hierarchical density based clustering. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 33–42 (2017). https://doi.org/10.1109/ICDMW.2017.12
McInnes, L., Healy, J., Astels, S.: hdbscan: Hierarchical density based clustering. The Journal of Open Source Software 2(11) (2017). https://doi.org/10.21105/joss.00205
Article Google Scholar
Pagie, L., Hogeweg, P.: Evolutionary consequences of coevolving targets. Evolutionary Computation 5(4), 401–418 (1997)
Article Google Scholar
Tang, J., Liu, J., Zhang, M., Mei, Q.: Visualizing large-scale and high-dimensional data. In: Proceedings of the 25th International Conference on World Wide Web, WWW ‘16, pp. 287–297. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2016). https://doi.org/10.1145/2872427.2883041
Uy, N.Q., Hoai, N.X., O’Neill, M., McKay, R.I., Galván-López, E.: Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genetic Programming and Evolvable Machines 12(2), 91–119 (2011)
Article Google Scholar
Valiente, G.: An efficient bottom-up distance between trees. In: Proc. 8th Int. Symposium on String Processing and Information Retrieval, pp. 212–219. IEEE Computer Science Press (2001)
Google Scholar
Worm, T., Chiu, K.: Prioritized grammar enumeration: symbolic regression by dynamic programming. In: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, pp. 1021–1028. ACM (2013)
Google Scholar

Download references

Acknowledgements

The authors thank the participants of the Genetic Programming in Theory and Practice (GPTP XVI) workshop for their valuable feedback and ideas which helped to improve the work described in this chapter. The authors gratefully acknowledge support by the Christian Doppler Research Association and the Federal Ministry for Digital and Economic Affairs within the Josef Ressel Center for Symbolic Regression.

Author information

Authors and Affiliations

Heuristic and Evolutionary Algorithms Laboratory (HEAL), University of Applied Sciences Upper Austria, Hagenberg, Austria
Gabriel Kronberger, Lukas Kammerer, Bogdan Burlacu, Stephan M. Winkler, Michael Kommenda & Michael Affenzeller
Josef Ressel Center for Symbolic Regression, University of Applied Sciences Upper Austria, Hagenberg, Austria
Gabriel Kronberger, Lukas Kammerer, Bogdan Burlacu & Michael Kommenda
Institute for Formal Models and Verification, Johannes Kepler University, Linz, Austria
Lukas Kammerer, Bogdan Burlacu, Stephan M. Winkler, Michael Kommenda & Michael Affenzeller

Authors

Gabriel Kronberger
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Kammerer
View author publications
You can also search for this author in PubMed Google Scholar
Bogdan Burlacu
View author publications
You can also search for this author in PubMed Google Scholar
Stephan M. Winkler
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kommenda
View author publications
You can also search for this author in PubMed Google Scholar
Michael Affenzeller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriel Kronberger .

Editor information

Editors and Affiliations

Computer Science and Engineering, John R. Koza Chair, Michigan State University, East Lansing, MI, USA
Wolfgang Banzhaf
Cognitive Science, Hampshire College, Amherst, MA, USA
Lee Spector
Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
Leigh Sheneman

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kronberger, G., Kammerer, L., Burlacu, B., Winkler, S.M., Kommenda, M., Affenzeller, M. (2019). Cluster Analysis of a Symbolic Regression Search Space. In: Banzhaf, W., Spector, L., Sheneman, L. (eds) Genetic Programming Theory and Practice XVI. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-04735-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-04735-1_5
Published: 24 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04734-4
Online ISBN: 978-3-030-04735-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics