Grammar-Based Vectorial Genetic Programming for Symbolic Regression

Fleck, Philipp; Winkler, Stephan; Kommenda, Michael; Affenzeller, Michael

doi:10.1007/978-981-16-8113-4_2

Philipp Fleck^7,8,
Stephan Winkler^7,8,
Michael Kommenda^7,9 &
…
Michael Affenzeller^7,8

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

493 Accesses
1 Citations

Abstract

Vectorial Genetic Programming (GP) is a young branch of GP, where the training data for symbolic models not only include regular, scalar variables, but also allow vector variables. Also, the model’s abilities are extended to allow operations on vectors, where most vector operations are simply performed component-wise. Additionally, new aggregation functions are introduced that reduce vectors into scalars, allowing the model to extract information from vectors by itself, thus eliminating the need of prior feature engineering that is otherwise necessary for traditional GP to utilize vector data. And due to the white-box nature of symbolic models, the operations on vectors can be as easily interpreted as regular operations on scalars. In this paper, we extend the ideas of vectorial GP of previous authors, and propose a grammar-based approach for vectorial GP that can deal with various challenges noted. To evaluate grammar-based vectorial GP, we have designed new benchmark functions that contain both scalar and vector variables, and show that traditional GP falls short very quickly for certain scenarios. Grammar-based vectorial GP, however, is able to solve all presented benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Similar behavior could also be achieved by only using vectors and treating scalars as vectors of length one.
2.
See ISO/IEC 14977
3.
https://www.tensorflow.org
4.
https://pytorch.org
5.
https://numpy.org/doc/stable/user/theory.broadcasting.html
6.
We avoided vector constants on purpose to avoid the problem of figuring out the correct vector length for a vector constant.
7.
https://dev.heuristiclab.com/
8.
https://dev.heuristiclab.com/wiki/AdditionalMaterial#GPTP2021

References

Affenzeller, M., Wagner, S.: Offspring selection: a new self-adaptive selection scheme for genetic algorithms. In: Adaptive and Natural Computing Algorithms, pp. 218–221 (2005)
Google Scholar
Alfaro-Cid, E., Sharman, K., Esparica-Alcázar, A.L.: Genetic programming and serial processing for time series classification. Evol. Comput. (2013)
Google Scholar
Azzali, I., Vanneschi, L., Bakurov, I., Silva, S., Ivaldi, M., Giacobini, M.: Towards the use of vector based GP to predict physiological time series. Appl. Soft Comput. J. 89, 106097 (2020)
Google Scholar
Azzali, I., Vanneschi, L., Silva, S., Bakurov, I., Giacobini, M.: A vectorial approach to genetic programming. EuroGP 213–227 (2019)
Google Scholar
Christ, M., Braun, N., Neuffer, J., Kempa-Liehr, A.W..: Time series feature extraction on basis of scalable hypothesis tests (tsfresh A Python package). Neurocomputing 307, 72–77 (2018)
Google Scholar
De Falco, I., Della Cioppa, A., Tarantino, E.: A genetic programming system for time series prediction and its application to El Niño forecast. In: Soft Computing: Methodologies and Applications (2005)
Google Scholar
Eads, D.R., Hill, D., Davis, S., Perkins, S.J., Ma, J., Porter, R.B., Theiler, J.P: Genetic algorithms and support vector machines for time series classification. In: Applications and Science of Neural Networks, Fuzzy Systems, and Evolutionary Computation V, vol. 4787, p. 74 (2002)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning (2016)
Google Scholar
Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 18, 1–43 (2018)
Google Scholar
Harvey, Dustin Y., Todd, Michael D.: Automated feature design for numeric sequence classification by genetic programming. IEEE Trans. Evol. Comput. 19(4), 474–489 (2015)
Google Scholar
Holladay, K.L., Robbins, K.A.: Evolution of signal processing algorithms using vector based genetic programming. In: 2007 15th International Conference on Digital Signal Processing, DSP 2007, pp. 503–506 (2007)
Google Scholar
Holladay, K., Robbins, K., Von Ronne, J.: FIFTH: a stack based GP language for vector processing. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4445 LNCS(December), pp. 102–113 (2007)
Google Scholar
Kommenda, M., Affenzeller, M., Kronberger, G., Winkler, S.M.: Nonlinear least squares optimization of constants in symbolic regression. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8111 LNCS(PART 1), pp. 420–427 (2013)
Google Scholar
Kommenda, M., Kronberger, G., Wagner, S., Winkler, S., Affenzeller, M.: On the architecture and implementation of tree-based genetic programming in HeuristicLab. In: GECCO’12—Proceedings of the 14th International Conference on Genetic and Evolutionary Computation Companion, pp. 101–108 (2012)
Google Scholar
McKay, R.I., Hoai, N.X., Whigham, P.A., Shan, Y., O’neill, M.: Grammar-based genetic programming: a survey. Genet. Program. Evolvable Mach. 11(3–4), 365–396 (2010)
Google Scholar
Miller, J.F., Harding, S.L.: Cartesian genetic programming. In: European Conference on Genetic Programming, pp. 121–132 (2000)
Google Scholar
Montana DJ (1995) Strongly typed genetic programming. Evol. Comput. 1 3(2), 199–230 (1995)
Google Scholar
Wagner, S., Kronberger, G., Beham, A., Kommenda, M., Scheibenpflug, A., Pitzer, E., Vonolfen, S., Kofler, M., Winkler, S., Dorfer, V., Affenzeller, M.: Architecture and design of the heuristiclab optimization environment 197–261 (2014)
Google Scholar
Xie, F., Song, A., Ciesielski, V.: Event detection in time series by genetic programming. In: 2012 IEEE Congress on Evolutionary Computation, CEC 2012, pp. 1–8 (2012)
Google Scholar

Download references

Acknowledgements

This work was carried out within the Dissertationsprogramm der Fachhochschule OÖ #875441 Vektor-basierte Genetische Programmierung für Symbolische Regression und Klassifikation mit Zeitreihen (SymRegZeit), funded by the Austrian Research Promotion Agency FFG. The authors also gratefully acknowledge support by the Christian Doppler Research Association and the Federal Ministry of Digital and Economic Affairs within the Josef Ressel Centre for Symbolic Regression.

Author information

Authors and Affiliations

Heuristic and Evolutionary Algorithms Laboratory (HEAL), University of Applied Sciences Upper Austria, Softwarepark 11, 4232, Hagenberg, Austria
Philipp Fleck, Stephan Winkler, Michael Kommenda & Michael Affenzeller
Institute for Formal Models and Verification, Johannes Kepler University, Altenberger Straße 69, 4040, Linz, Austria
Philipp Fleck, Stephan Winkler & Michael Affenzeller
Josef Ressel Center for Symbolic Regression, University of Applied Sciences Upper Austria, Softwarepark 11, 4232, Hagenberg, Austria
Michael Kommenda

Authors

Philipp Fleck
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Winkler
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kommenda
View author publications
You can also search for this author in PubMed Google Scholar
Michael Affenzeller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philipp Fleck .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
Wolfgang Banzhaf
Tecnológico Nacional de México/IT de Tijuana, Tijuana, Baja California, Mexico
Leonardo Trujillo
School of Informatics, Communications and Media, University of Applied Sciences Upper Austria, Hagenberg, Austria
Stephan Winkler
Evolution Enterprise, Ann Arbor, MI, USA
Bill Worzel

Appendix

Below, the equations to generate the target variable for each benchmark are listed. Each scalar variable $x_i$ is defined by a uniform distribution with lower and upper bound, denoted by $\mathcal {U}(\mathrm {lower}, \mathrm {upper})$. Each vector variable $\boldsymbol{v_i}$ is defined by two hidden, scalar variables for defining the mean and standard deviation of the vector. These hidden variables are also defined and obtained via uniform distributions, like a regular scalar variable. Then, based on the hidden mean and standard deviation for each vector variable, we define the final uniform distribution where the values for each individual vector is sampled from, where the upper and lower bound are $\mu \pm \sqrt{12}\sigma / 2$. In this case, we denote the vector variable by $\mathcal {U}(\mu = [\text {lower mean}, \text {uper mean}], \sigma = [\text {lower std dev}, \text {upper std dev}]; \mathrm {length})$. This step ensures that the mean and standard deviation of each vector is also randomized.

$$\begin{aligned} \mathrm {test\_A\_01} \quad y&= 2.5 \cdot \mathrm {mean}(\boldsymbol{v_1}) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \mathrm {test\_A\_02} \quad y&= 2.5 \cdot \mathrm {mean}(\boldsymbol{v_1}) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \mathrm {test\_A\_03} \quad y&= 2.5 \cdot x_1 + \mathrm {mean}(\boldsymbol{v_1}) + 2.0 \\ x_1&\sim \mathcal {U}(1, 4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \mathrm {test\_A\_04} \quad y&= x_1 \cdot \mathrm {var}(\boldsymbol{v_1}) / 3.0 - 3.0 \cdot \mathrm {mean}(\boldsymbol{v_2}) / x_2 \\ x_1&\sim \mathcal {U}(1, 4) \\ x_2&\sim \mathcal {U}(-8, -4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \boldsymbol{v_2}&\sim \mathcal {U}(\mu =[10,20], \sigma =[4,8]; 20) \\ \mathrm {test\_B\_01} \quad y&= x_1 \cdot \mathrm {mean}(\boldsymbol{v_1} + \boldsymbol{v_2}) \\ x_1&\sim \mathcal {U}(1, 4) \\ x_2&\sim \mathcal {U}(-8, -4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \boldsymbol{v_2}&\sim \mathcal {U}(\mu =[10,20], \sigma =[4,6]; 20) \\ \mathrm {test\_B\_02} \quad y&= x_1 \cdot \mathrm {mean}(\boldsymbol{v_1} \cdot \boldsymbol{v_2}) \\ x_1&\sim \mathcal {U}(1, 4) \\ x_2&\sim \mathcal {U}(-8, -4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \boldsymbol{v_2}&\sim \mathcal {U}(\mu =[10,20], \sigma =[4,6]; 20) \\ \mathrm {test\_B\_03} \quad y&= x_1 \cdot \mathrm {std}(\boldsymbol{v_1} + \boldsymbol{v_2})\\ x_1&\sim \mathcal {U}(1, 4) \\ x_2&\sim \mathcal {U}(-8, -4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \boldsymbol{v_2}&\sim \mathcal {U}(\mu =[10,20], \sigma =[4,6]; 20) \\ \mathrm {test\_B\_04} \quad y&= x_1 \cdot \mathrm {std}(\boldsymbol{v_1} \cdot \boldsymbol{v_2}) \\ x_1&\sim \mathcal {U}(1, 4) \\ x_2&\sim \mathcal {U}(-8, -4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \boldsymbol{v_2}&\sim \mathcal {U}(\mu =[10,20], \sigma =[4,6]; 20) \\ \mathrm {test\_B\_05} \quad y&= x_1 \cdot \mathrm {mean}((\boldsymbol{v_1} + 2\boldsymbol{v_3}) / (0.5\boldsymbol{v_2})) \\ x_1&\sim \mathcal {U}(1, 4) \\ x_2&\sim \mathcal {U}(-8, -4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \boldsymbol{v_2}&\sim \mathcal {U}(\mu =[10,20], \sigma =[4,6]; 20) \\ \boldsymbol{v_3}&\sim \mathcal {U}(\mu =[2,4], \sigma =[0.05,0.15]; 20) \\ \mathrm {test\_B\_06} \quad y&= x_1 \cdot \mathrm {std}((\boldsymbol{v_1} + 2\boldsymbol{v_3}) / (0.5\boldsymbol{v_2})) \\ x_1&\sim \mathcal {U}(1, 4) \\ x_2&\sim \mathcal {U}(-8, -4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \boldsymbol{v_2}&\sim \mathcal {U}(\mu =[10,20], \sigma =[4,6]; 20) \\ \boldsymbol{v_3}&\sim \mathcal {U}(\mu =[2,4], \sigma =[0.05,0.15]; 20) \\ \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fleck, P., Winkler, S., Kommenda, M., Affenzeller, M. (2022). Grammar-Based Vectorial Genetic Programming for Symbolic Regression. In: Banzhaf, W., Trujillo, L., Winkler, S., Worzel, B. (eds) Genetic Programming Theory and Practice XVIII. Genetic and Evolutionary Computation. Springer, Singapore. https://doi.org/10.1007/978-981-16-8113-4_2

Download citation

DOI: https://doi.org/10.1007/978-981-16-8113-4_2
Published: 11 February 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8112-7
Online ISBN: 978-981-16-8113-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Grammar-Based Vectorial Genetic Programming for Symbolic Regression

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation