Skip to main content

Grammar-Based Vectorial Genetic Programming for Symbolic Regression

  • Chapter
  • First Online:
Genetic Programming Theory and Practice XVIII

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

Vectorial Genetic Programming (GP) is a young branch of GP, where the training data for symbolic models not only include regular, scalar variables, but also allow vector variables. Also, the model’s abilities are extended to allow operations on vectors, where most vector operations are simply performed component-wise. Additionally, new aggregation functions are introduced that reduce vectors into scalars, allowing the model to extract information from vectors by itself, thus eliminating the need of prior feature engineering that is otherwise necessary for traditional GP to utilize vector data. And due to the white-box nature of symbolic models, the operations on vectors can be as easily interpreted as regular operations on scalars. In this paper, we extend the ideas of vectorial GP of previous authors, and propose a grammar-based approach for vectorial GP that can deal with various challenges noted. To evaluate grammar-based vectorial GP, we have designed new benchmark functions that contain both scalar and vector variables, and show that traditional GP falls short very quickly for certain scenarios. Grammar-based vectorial GP, however, is able to solve all presented benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Similar behavior could also be achieved by only using vectors and treating scalars as vectors of length one.

  2. 2.

    See ISO/IEC 14977

  3. 3.

    https://www.tensorflow.org

  4. 4.

    https://pytorch.org

  5. 5.

    https://numpy.org/doc/stable/user/theory.broadcasting.html

  6. 6.

    We avoided vector constants on purpose to avoid the problem of figuring out the correct vector length for a vector constant.

  7. 7.

    https://dev.heuristiclab.com/

  8. 8.

    https://dev.heuristiclab.com/wiki/AdditionalMaterial#GPTP2021

References

  1. Affenzeller, M., Wagner, S.: Offspring selection: a new self-adaptive selection scheme for genetic algorithms. In: Adaptive and Natural Computing Algorithms, pp. 218–221 (2005)

    Google Scholar 

  2. Alfaro-Cid, E., Sharman, K., Esparica-Alcázar, A.L.: Genetic programming and serial processing for time series classification. Evol. Comput. (2013)

    Google Scholar 

  3. Azzali, I., Vanneschi, L., Bakurov, I., Silva, S., Ivaldi, M., Giacobini, M.: Towards the use of vector based GP to predict physiological time series. Appl. Soft Comput. J. 89, 106097 (2020)

    Google Scholar 

  4. Azzali, I., Vanneschi, L., Silva, S., Bakurov, I., Giacobini, M.: A vectorial approach to genetic programming. EuroGP 213–227 (2019)

    Google Scholar 

  5. Christ, M., Braun, N., Neuffer, J., Kempa-Liehr, A.W..: Time series feature extraction on basis of scalable hypothesis tests (tsfresh A Python package). Neurocomputing 307, 72–77 (2018)

    Google Scholar 

  6. De Falco, I., Della Cioppa, A., Tarantino, E.: A genetic programming system for time series prediction and its application to El Niño forecast. In: Soft Computing: Methodologies and Applications (2005)

    Google Scholar 

  7. Eads, D.R., Hill, D., Davis, S., Perkins, S.J., Ma, J., Porter, R.B., Theiler, J.P: Genetic algorithms and support vector machines for time series classification. In: Applications and Science of Neural Networks, Fuzzy Systems, and Evolutionary Computation V, vol. 4787, p. 74 (2002)

    Google Scholar 

  8. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning (2016)

    Google Scholar 

  9. Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 18, 1–43 (2018)

    Google Scholar 

  10. Harvey, Dustin Y., Todd, Michael D.: Automated feature design for numeric sequence classification by genetic programming. IEEE Trans. Evol. Comput. 19(4), 474–489 (2015)

    Google Scholar 

  11. Holladay, K.L., Robbins, K.A.: Evolution of signal processing algorithms using vector based genetic programming. In: 2007 15th International Conference on Digital Signal Processing, DSP 2007, pp. 503–506 (2007)

    Google Scholar 

  12. Holladay, K., Robbins, K., Von Ronne, J.: FIFTH: a stack based GP language for vector processing. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4445 LNCS(December), pp. 102–113 (2007)

    Google Scholar 

  13. Kommenda, M., Affenzeller, M., Kronberger, G., Winkler, S.M.: Nonlinear least squares optimization of constants in symbolic regression. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8111 LNCS(PART 1), pp. 420–427 (2013)

    Google Scholar 

  14. Kommenda, M., Kronberger, G., Wagner, S., Winkler, S., Affenzeller, M.: On the architecture and implementation of tree-based genetic programming in HeuristicLab. In: GECCO’12—Proceedings of the 14th International Conference on Genetic and Evolutionary Computation Companion, pp. 101–108 (2012)

    Google Scholar 

  15. McKay, R.I., Hoai, N.X., Whigham, P.A., Shan, Y., O’neill, M.: Grammar-based genetic programming: a survey. Genet. Program. Evolvable Mach. 11(3–4), 365–396 (2010)

    Google Scholar 

  16. Miller, J.F., Harding, S.L.: Cartesian genetic programming. In: European Conference on Genetic Programming, pp. 121–132 (2000)

    Google Scholar 

  17. Montana DJ (1995) Strongly typed genetic programming. Evol. Comput. 1 3(2), 199–230 (1995)

    Google Scholar 

  18. Wagner, S., Kronberger, G., Beham, A., Kommenda, M., Scheibenpflug, A., Pitzer, E., Vonolfen, S., Kofler, M., Winkler, S., Dorfer, V., Affenzeller, M.: Architecture and design of the heuristiclab optimization environment 197–261 (2014)

    Google Scholar 

  19. Xie, F., Song, A., Ciesielski, V.: Event detection in time series by genetic programming. In: 2012 IEEE Congress on Evolutionary Computation, CEC 2012, pp. 1–8 (2012)

    Google Scholar 

Download references

Acknowledgements

This work was carried out within the Dissertationsprogramm der Fachhochschule OÖ #875441 Vektor-basierte Genetische Programmierung für Symbolische Regression und Klassifikation mit Zeitreihen (SymRegZeit), funded by the Austrian Research Promotion Agency FFG. The authors also gratefully acknowledge support by the Christian Doppler Research Association and the Federal Ministry of Digital and Economic Affairs within the Josef Ressel Centre for Symbolic Regression.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philipp Fleck .

Editor information

Editors and Affiliations

Appendix

Appendix

Below, the equations to generate the target variable for each benchmark are listed. Each scalar variable \(x_i\) is defined by a uniform distribution with lower and upper bound, denoted by \(\mathcal {U}(\mathrm {lower}, \mathrm {upper})\). Each vector variable \(\boldsymbol{v_i}\) is defined by two hidden, scalar variables for defining the mean and standard deviation of the vector. These hidden variables are also defined and obtained via uniform distributions, like a regular scalar variable. Then, based on the hidden mean and standard deviation for each vector variable, we define the final uniform distribution where the values for each individual vector is sampled from, where the upper and lower bound are \(\mu \pm \sqrt{12}\sigma / 2\). In this case, we denote the vector variable by \(\mathcal {U}(\mu = [\text {lower mean}, \text {uper mean}], \sigma = [\text {lower std dev}, \text {upper std dev}]; \mathrm {length})\). This step ensures that the mean and standard deviation of each vector is also randomized.

$$\begin{aligned} \mathrm {test\_A\_01} \quad y&= 2.5 \cdot \mathrm {mean}(\boldsymbol{v_1}) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \mathrm {test\_A\_02} \quad y&= 2.5 \cdot \mathrm {mean}(\boldsymbol{v_1}) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \mathrm {test\_A\_03} \quad y&= 2.5 \cdot x_1 + \mathrm {mean}(\boldsymbol{v_1}) + 2.0 \\ x_1&\sim \mathcal {U}(1, 4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \mathrm {test\_A\_04} \quad y&= x_1 \cdot \mathrm {var}(\boldsymbol{v_1}) / 3.0 - 3.0 \cdot \mathrm {mean}(\boldsymbol{v_2}) / x_2 \\ x_1&\sim \mathcal {U}(1, 4) \\ x_2&\sim \mathcal {U}(-8, -4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \boldsymbol{v_2}&\sim \mathcal {U}(\mu =[10,20], \sigma =[4,8]; 20) \\ \mathrm {test\_B\_01} \quad y&= x_1 \cdot \mathrm {mean}(\boldsymbol{v_1} + \boldsymbol{v_2}) \\ x_1&\sim \mathcal {U}(1, 4) \\ x_2&\sim \mathcal {U}(-8, -4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \boldsymbol{v_2}&\sim \mathcal {U}(\mu =[10,20], \sigma =[4,6]; 20) \\ \mathrm {test\_B\_02} \quad y&= x_1 \cdot \mathrm {mean}(\boldsymbol{v_1} \cdot \boldsymbol{v_2}) \\ x_1&\sim \mathcal {U}(1, 4) \\ x_2&\sim \mathcal {U}(-8, -4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \boldsymbol{v_2}&\sim \mathcal {U}(\mu =[10,20], \sigma =[4,6]; 20) \\ \mathrm {test\_B\_03} \quad y&= x_1 \cdot \mathrm {std}(\boldsymbol{v_1} + \boldsymbol{v_2})\\ x_1&\sim \mathcal {U}(1, 4) \\ x_2&\sim \mathcal {U}(-8, -4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \boldsymbol{v_2}&\sim \mathcal {U}(\mu =[10,20], \sigma =[4,6]; 20) \\ \mathrm {test\_B\_04} \quad y&= x_1 \cdot \mathrm {std}(\boldsymbol{v_1} \cdot \boldsymbol{v_2}) \\ x_1&\sim \mathcal {U}(1, 4) \\ x_2&\sim \mathcal {U}(-8, -4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \boldsymbol{v_2}&\sim \mathcal {U}(\mu =[10,20], \sigma =[4,6]; 20) \\ \mathrm {test\_B\_05} \quad y&= x_1 \cdot \mathrm {mean}((\boldsymbol{v_1} + 2\boldsymbol{v_3}) / (0.5\boldsymbol{v_2})) \\ x_1&\sim \mathcal {U}(1, 4) \\ x_2&\sim \mathcal {U}(-8, -4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \boldsymbol{v_2}&\sim \mathcal {U}(\mu =[10,20], \sigma =[4,6]; 20) \\ \boldsymbol{v_3}&\sim \mathcal {U}(\mu =[2,4], \sigma =[0.05,0.15]; 20) \\ \mathrm {test\_B\_06} \quad y&= x_1 \cdot \mathrm {std}((\boldsymbol{v_1} + 2\boldsymbol{v_3}) / (0.5\boldsymbol{v_2})) \\ x_1&\sim \mathcal {U}(1, 4) \\ x_2&\sim \mathcal {U}(-8, -4) \\ \boldsymbol{v_1}&\sim \mathcal {U}(\mu =[4,8], \sigma =[2,4]; 20) \\ \boldsymbol{v_2}&\sim \mathcal {U}(\mu =[10,20], \sigma =[4,6]; 20) \\ \boldsymbol{v_3}&\sim \mathcal {U}(\mu =[2,4], \sigma =[0.05,0.15]; 20) \\ \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Fleck, P., Winkler, S., Kommenda, M., Affenzeller, M. (2022). Grammar-Based Vectorial Genetic Programming for Symbolic Regression. In: Banzhaf, W., Trujillo, L., Winkler, S., Worzel, B. (eds) Genetic Programming Theory and Practice XVIII. Genetic and Evolutionary Computation. Springer, Singapore. https://doi.org/10.1007/978-981-16-8113-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-8113-4_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-8112-7

  • Online ISBN: 978-981-16-8113-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics