Abstract
Multidimensional Multiclass Genetic Programming with Multidimensional Populations (M3GP) was originally proposed as a wrapper approach for supervised classification. M3GP searches for transformations of the form \(k:{\mathbb {R}}^p \rightarrow {\mathbb {R}}^d\), where p is the number of dimensions of the problem data, and d is the dimensionality of the transformed data, as determined by the search. This work extends M3GP to symbolic regression, building models that are linear in the parameters using the transformed data. The proposal implements a sequential memetic structure with Lamarckian inheritance, combining two local search methods: a greedy pruning algorithm and least squares parameter estimation. Experimental results show that M3GP outperforms several standard and state-of-the-art regression techniques, as well as other GP approaches. Using several synthetic and real-world problems, M3GP outperforms most methods in terms of RMSE and generates more parsimonious models. The performance of M3GP can be explained by the fact that M3GP increases the maximal mutual information in the new feature space.
Similar content being viewed by others
Notes
Dimensions are also removed by the pruning operator, but only for the best individual of the population.
While fitness is usually expected to be maximized, in this case we are posing a minimization problem but prefer to use the term fitness to match common usage in evolutionary and GP literature, another option would be to refer to it as a cost function
In this work the evolved transformations are not simplified, so the reported size of the models is calculated based on how they were produced by the M3GP search.
We do not consider MLR in the size comparisons, since it is always the same based on the number of problem features
This behavior was consistently shown in all runs, but a single run is presented in these plots for a simpler visualization.
References
Affenzeller M, Winkler SM, Burlacu B, Kronberger G, Kommenda M, Wagner S (2017) Dynamic observation of genotypic and phenotypic diversity for different symbolic regression gp variants. In: Proceedings of the genetic and evolutionary computation conference companion, GECCO ’17. ACM, New York, pp 1553–1558
Arnaldo I, Krawiec K, O’Reilly U.M (2014) Multiple regression genetic programming. In: Proceedings of the 2014 annual conference on genetic and evolutionary computation, GECCO ’14. ACM, New York, pp 879–886
Arnaldo I, O’Reilly U.M, Veeramachaneni K (2015) Building predictive models via feature synthesis. In: Proceedings of the 2015 annual conference on genetic and evolutionary computation, GECCO ’15. ACM, New York, pp 983–990
Caraffini F, Neri F, Iacca G, Mol A (2013) Parallel memetic structures. Inf Sci 227:60–82
Caraffini F, Neri F, Picinali L (2014) An analysis on separability for memetic computing automatic design. Inf Sci 265:1–22
Castelli M, Silva S, Vanneschi L (2015) A c++ framework for geometric semantic genetic programming. Genet Program Evolvable Mach 16(1):73–81
Chen X, Ong YS, Lim MH, Tan KC (2011) A multi-facet survey on memetic computation. IEEE Trans Evolut Comput 15(5):591–607
Doerr B, Kötzing T, Lagodzinski J.A.G, Lengler J (2017) Bounding bloat in genetic programming. In: Proceedings of the genetic and evolutionary computation conference, GECCO ’17. ACM, New York, pp 921–928
Ertugrul OF (2018) A novel type of activation function in artificial neural networks: trained activation function. Neural Netw 99:148–157
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67
Ingalalli V, Silva S, Castelli M, Vanneschi L (2014) A multi-dimensional genetic programming approach for multi-class classification problems. In: Nicolau M et al (eds) 17th European conference on genetic programming, LNCS, vol 8599. Springer, Granada, Spain, pp 48–60
Kojadinovic I (2005) On the use of mutual information in data analysis: an overview. In: Proceedings of international symposium applied stochastic models data analysis, pp 738–47
Koza JR (1992) Genetic programming: vol 1, on the programming of computers by means of natural selection. MIT Press, Cambridge
Luke S, Panait L (2002) Lexicographic parsimony pressure. In: Proceedings of GECCO-2002. Morgan Kaufmann Publishers, pp 829–836
Martnez Y, Naredo E, Trujillo L, Legrand P, Lpez U (2017) A comparison of fitness-case sampling methods for genetic programming, vol 29. Taylor & Francis, pp 1203–1224
McConaghy T (2011) Genetic programming theory and practice IX, chap. FFX: fast, scalable, deterministic symbolic regression technology. Springer New York, pp 235–260
McDermott J, White D.R, Luke S, Manzoni L, Castelli M, Vanneschi L, Jaskowski W, Krawiec K, Harper R, De Jong K, O’Reilly U.M (2012) Genetic programming needs better benchmarks. In: Proceedings of the 14th annual conference on genetic and evolutionary computation, GECCO ’12, pp 791–798
de Melo VV (2014) Kaizen programming. In: Proceedings of the 2014 annual conference on genetic and evolutionary computation, GECCO ’14. ACM, New York, pp 895–902
de Melo VV, Banzhaf W (2017) Improving the prediction of material properties of concrete using kaizen programming with simulated annealing. Neurocomputing 246:25–44
Moraglio A, Krawiec K, Johnson CG (2012) Parallel problem solving from nature—PPSN XII: 12th international conference, Taormina, Italy, September 1–5, 2012, Proceedings, Part I, chap. Geometric Semantic Genetic Programming, pp 21–31. Springer Berlin
Muñoz L, Silva S, Trujillo L (2015) M3GP: multiclass classification with GP. In: Machado P et al (eds) 18th European conference on genetic programming, LNCS, vol 9025. Springer, Copenhagen, pp 78–91
Roy SS, Roy R, Balas VE (2018) Estimating heating load in buildings using multivariate adaptive regression splines, extreme learning machine, a hybrid model of MARS and ELM. Renew Sustain Energy Rev 82:4256–4268
Silva S, Muñoz L, Trujillo L, Ingalalli V, Castelli M, Vanneschi L (2016) Genetic programming theory and practice XIII, chap. Multiclass classification through multidimensional clustering. Springer, Berlin
Sipper M, Fu W, Ahuja K, Moore JH (2018) Investigating the parameter space of evolutionary algorithms. BioData Min 11(1):2
Trujillo L, Muñoz L, Galván-López E, Silva S (2016) neat genetic programming: controlling bloat naturally. Inf Sci 333:21–43
Tsanas A, Xifara A (2012) Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build 49:560–567
Vladislavleva EJ, Smits GF, Den Hertog D (2009) Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. Trans Evol Comput 13(2):333–349
Yeh IC (1998) Modeling of strength of high-performance concrete using artificial neural networks. Cem Concrete Res 28(12):1797–1808
Z-Flores E, Trujillo L, Schütze O, Legrand P (2014) Evolve—a bridge between probability, set oriented numerics, and evolutionary computation V, chap. Evaluating the effects of local search in genetic programming. Springer, Cham, pp 213–228
Acknowledgements
First author supported by CONACYT (México) scholarship No. 401223. Research was funded by CONACYT Basic Science Research Project No. 178323, CONACYT Fronteras de la Ciencia FC-2015:2944, and FP7- Marie Curie-IRSES 2013 European Commission program with project ACoBSEC with contract No. 612689. Funding also provided by project PERSEIDS (PTDC/EMS-SIS/0642/2014) and BioISI RD unit, UID/MULTI/04046/2013, funded by FCT/MCTES/PIDDAC, Portugal, and TecNM project 6823.18-P, Mexico.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Muñoz, L., Trujillo, L., Silva, S. et al. Evolving multidimensional transformations for symbolic regression with M3GP. Memetic Comp. 11, 111–126 (2019). https://doi.org/10.1007/s12293-018-0274-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12293-018-0274-5