Abstract
Vectorial Genetic Programming (GP) is a young branch of GP, where the training data for symbolic models not only include regular, scalar variables, but also allow vector variables. Also, the model’s abilities are extended to allow operations on vectors, where most vector operations are simply performed component-wise. Additionally, new aggregation functions are introduced that reduce vectors into scalars, allowing the model to extract information from vectors by itself, thus eliminating the need of prior feature engineering that is otherwise necessary for traditional GP to utilize vector data. And due to the white-box nature of symbolic models, the operations on vectors can be as easily interpreted as regular operations on scalars. In this paper, we extend the ideas of vectorial GP of previous authors, and propose a grammar-based approach for vectorial GP that can deal with various challenges noted. To evaluate grammar-based vectorial GP, we have designed new benchmark functions that contain both scalar and vector variables, and show that traditional GP falls short very quickly for certain scenarios. Grammar-based vectorial GP, however, is able to solve all presented benchmarks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Similar behavior could also be achieved by only using vectors and treating scalars as vectors of length one.
- 2.
See ISO/IEC 14977
- 3.
- 4.
- 5.
- 6.
We avoided vector constants on purpose to avoid the problem of figuring out the correct vector length for a vector constant.
- 7.
- 8.
References
Affenzeller, M., Wagner, S.: Offspring selection: a new self-adaptive selection scheme for genetic algorithms. In: Adaptive and Natural Computing Algorithms, pp. 218–221 (2005)
Alfaro-Cid, E., Sharman, K., Esparica-Alcázar, A.L.: Genetic programming and serial processing for time series classification. Evol. Comput. (2013)
Azzali, I., Vanneschi, L., Bakurov, I., Silva, S., Ivaldi, M., Giacobini, M.: Towards the use of vector based GP to predict physiological time series. Appl. Soft Comput. J. 89, 106097 (2020)
Azzali, I., Vanneschi, L., Silva, S., Bakurov, I., Giacobini, M.: A vectorial approach to genetic programming. EuroGP 213–227 (2019)
Christ, M., Braun, N., Neuffer, J., Kempa-Liehr, A.W..: Time series feature extraction on basis of scalable hypothesis tests (tsfresh A Python package). Neurocomputing 307, 72–77 (2018)
De Falco, I., Della Cioppa, A., Tarantino, E.: A genetic programming system for time series prediction and its application to El Niño forecast. In: Soft Computing: Methodologies and Applications (2005)
Eads, D.R., Hill, D., Davis, S., Perkins, S.J., Ma, J., Porter, R.B., Theiler, J.P: Genetic algorithms and support vector machines for time series classification. In: Applications and Science of Neural Networks, Fuzzy Systems, and Evolutionary Computation V, vol. 4787, p. 74 (2002)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning (2016)
Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 18, 1–43 (2018)
Harvey, Dustin Y., Todd, Michael D.: Automated feature design for numeric sequence classification by genetic programming. IEEE Trans. Evol. Comput. 19(4), 474–489 (2015)
Holladay, K.L., Robbins, K.A.: Evolution of signal processing algorithms using vector based genetic programming. In: 2007 15th International Conference on Digital Signal Processing, DSP 2007, pp. 503–506 (2007)
Holladay, K., Robbins, K., Von Ronne, J.: FIFTH: a stack based GP language for vector processing. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4445 LNCS(December), pp. 102–113 (2007)
Kommenda, M., Affenzeller, M., Kronberger, G., Winkler, S.M.: Nonlinear least squares optimization of constants in symbolic regression. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8111 LNCS(PART 1), pp. 420–427 (2013)
Kommenda, M., Kronberger, G., Wagner, S., Winkler, S., Affenzeller, M.: On the architecture and implementation of tree-based genetic programming in HeuristicLab. In: GECCO’12—Proceedings of the 14th International Conference on Genetic and Evolutionary Computation Companion, pp. 101–108 (2012)
McKay, R.I., Hoai, N.X., Whigham, P.A., Shan, Y., O’neill, M.: Grammar-based genetic programming: a survey. Genet. Program. Evolvable Mach. 11(3–4), 365–396 (2010)
Miller, J.F., Harding, S.L.: Cartesian genetic programming. In: European Conference on Genetic Programming, pp. 121–132 (2000)
Montana DJ (1995) Strongly typed genetic programming. Evol. Comput. 1 3(2), 199–230 (1995)
Wagner, S., Kronberger, G., Beham, A., Kommenda, M., Scheibenpflug, A., Pitzer, E., Vonolfen, S., Kofler, M., Winkler, S., Dorfer, V., Affenzeller, M.: Architecture and design of the heuristiclab optimization environment 197–261 (2014)
Xie, F., Song, A., Ciesielski, V.: Event detection in time series by genetic programming. In: 2012 IEEE Congress on Evolutionary Computation, CEC 2012, pp. 1–8 (2012)
Acknowledgements
This work was carried out within the Dissertationsprogramm der Fachhochschule OÖ #875441 Vektor-basierte Genetische Programmierung für Symbolische Regression und Klassifikation mit Zeitreihen (SymRegZeit), funded by the Austrian Research Promotion Agency FFG. The authors also gratefully acknowledge support by the Christian Doppler Research Association and the Federal Ministry of Digital and Economic Affairs within the Josef Ressel Centre for Symbolic Regression.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Below, the equations to generate the target variable for each benchmark are listed. Each scalar variable \(x_i\) is defined by a uniform distribution with lower and upper bound, denoted by \(\mathcal {U}(\mathrm {lower}, \mathrm {upper})\). Each vector variable \(\boldsymbol{v_i}\) is defined by two hidden, scalar variables for defining the mean and standard deviation of the vector. These hidden variables are also defined and obtained via uniform distributions, like a regular scalar variable. Then, based on the hidden mean and standard deviation for each vector variable, we define the final uniform distribution where the values for each individual vector is sampled from, where the upper and lower bound are \(\mu \pm \sqrt{12}\sigma / 2\). In this case, we denote the vector variable by \(\mathcal {U}(\mu = [\text {lower mean}, \text {uper mean}], \sigma = [\text {lower std dev}, \text {upper std dev}]; \mathrm {length})\). This step ensures that the mean and standard deviation of each vector is also randomized.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Fleck, P., Winkler, S., Kommenda, M., Affenzeller, M. (2022). Grammar-Based Vectorial Genetic Programming for Symbolic Regression. In: Banzhaf, W., Trujillo, L., Winkler, S., Worzel, B. (eds) Genetic Programming Theory and Practice XVIII. Genetic and Evolutionary Computation. Springer, Singapore. https://doi.org/10.1007/978-981-16-8113-4_2
Download citation
DOI: https://doi.org/10.1007/978-981-16-8113-4_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8112-7
Online ISBN: 978-981-16-8113-4
eBook Packages: Computer ScienceComputer Science (R0)