Skip to main content

Advertisement

Log in

Evolving multidimensional transformations for symbolic regression with M3GP

  • Regular Research Paper
  • Published:
Memetic Computing Aims and scope Submit manuscript

Abstract

Multidimensional Multiclass Genetic Programming with Multidimensional Populations (M3GP) was originally proposed as a wrapper approach for supervised classification. M3GP searches for transformations of the form \(k:{\mathbb {R}}^p \rightarrow {\mathbb {R}}^d\), where p is the number of dimensions of the problem data, and d is the dimensionality of the transformed data, as determined by the search. This work extends M3GP to symbolic regression, building models that are linear in the parameters using the transformed data. The proposal implements a sequential memetic structure with Lamarckian inheritance, combining two local search methods: a greedy pruning algorithm and least squares parameter estimation. Experimental results show that M3GP outperforms several standard and state-of-the-art regression techniques, as well as other GP approaches. Using several synthetic and real-world problems, M3GP outperforms most methods in terms of RMSE and generates more parsimonious models. The performance of M3GP can be explained by the fact that M3GP increases the maximal mutual information in the new feature space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. Dimensions are also removed by the pruning operator, but only for the best individual of the population.

  2. While fitness is usually expected to be maximized, in this case we are posing a minimization problem but prefer to use the term fitness to match common usage in evolutionary and GP literature, another option would be to refer to it as a cost function

  3. In this work the evolved transformations are not simplified, so the reported size of the models is calculated based on how they were produced by the M3GP search.

  4. http://www.tree-lab.org

  5. http://gplab.sourceforge.net

  6. http://www.cs.rtu.lv/jekabsons/

  7. http://trent.st/ffx/

  8. http://gsgp.sourceforge.net/

  9. We do not consider MLR in the size comparisons, since it is always the same based on the number of problem features

  10. This behavior was consistently shown in all runs, but a single run is presented in these plots for a simpler visualization.

References

  1. Affenzeller M, Winkler SM, Burlacu B, Kronberger G, Kommenda M, Wagner S (2017) Dynamic observation of genotypic and phenotypic diversity for different symbolic regression gp variants. In: Proceedings of the genetic and evolutionary computation conference companion, GECCO ’17. ACM, New York, pp 1553–1558

  2. Arnaldo I, Krawiec K, O’Reilly U.M (2014) Multiple regression genetic programming. In: Proceedings of the 2014 annual conference on genetic and evolutionary computation, GECCO ’14. ACM, New York, pp 879–886

  3. Arnaldo I, O’Reilly U.M, Veeramachaneni K (2015) Building predictive models via feature synthesis. In: Proceedings of the 2015 annual conference on genetic and evolutionary computation, GECCO ’15. ACM, New York, pp 983–990

  4. Caraffini F, Neri F, Iacca G, Mol A (2013) Parallel memetic structures. Inf Sci 227:60–82

    Article  MathSciNet  Google Scholar 

  5. Caraffini F, Neri F, Picinali L (2014) An analysis on separability for memetic computing automatic design. Inf Sci 265:1–22

    Article  MathSciNet  Google Scholar 

  6. Castelli M, Silva S, Vanneschi L (2015) A c++ framework for geometric semantic genetic programming. Genet Program Evolvable Mach 16(1):73–81

    Article  Google Scholar 

  7. Chen X, Ong YS, Lim MH, Tan KC (2011) A multi-facet survey on memetic computation. IEEE Trans Evolut Comput 15(5):591–607

    Article  Google Scholar 

  8. Doerr B, Kötzing T, Lagodzinski J.A.G, Lengler J (2017) Bounding bloat in genetic programming. In: Proceedings of the genetic and evolutionary computation conference, GECCO ’17. ACM, New York, pp 921–928

  9. Ertugrul OF (2018) A novel type of activation function in artificial neural networks: trained activation function. Neural Netw 99:148–157

    Article  Google Scholar 

  10. Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67

    Article  MathSciNet  MATH  Google Scholar 

  11. Ingalalli V, Silva S, Castelli M, Vanneschi L (2014) A multi-dimensional genetic programming approach for multi-class classification problems. In: Nicolau M et al (eds) 17th European conference on genetic programming, LNCS, vol 8599. Springer, Granada, Spain, pp 48–60

  12. Kojadinovic I (2005) On the use of mutual information in data analysis: an overview. In: Proceedings of international symposium applied stochastic models data analysis, pp 738–47

  13. Koza JR (1992) Genetic programming: vol 1, on the programming of computers by means of natural selection. MIT Press, Cambridge

    MATH  Google Scholar 

  14. Luke S, Panait L (2002) Lexicographic parsimony pressure. In: Proceedings of GECCO-2002. Morgan Kaufmann Publishers, pp 829–836

  15. Martnez Y, Naredo E, Trujillo L, Legrand P, Lpez U (2017) A comparison of fitness-case sampling methods for genetic programming, vol 29. Taylor & Francis, pp 1203–1224

  16. McConaghy T (2011) Genetic programming theory and practice IX, chap. FFX: fast, scalable, deterministic symbolic regression technology. Springer New York, pp 235–260

  17. McDermott J, White D.R, Luke S, Manzoni L, Castelli M, Vanneschi L, Jaskowski W, Krawiec K, Harper R, De Jong K, O’Reilly U.M (2012) Genetic programming needs better benchmarks. In: Proceedings of the 14th annual conference on genetic and evolutionary computation, GECCO ’12, pp 791–798

  18. de Melo VV (2014) Kaizen programming. In: Proceedings of the 2014 annual conference on genetic and evolutionary computation, GECCO ’14. ACM, New York, pp 895–902

  19. de Melo VV, Banzhaf W (2017) Improving the prediction of material properties of concrete using kaizen programming with simulated annealing. Neurocomputing 246:25–44

    Article  Google Scholar 

  20. Moraglio A, Krawiec K, Johnson CG (2012) Parallel problem solving from nature—PPSN XII: 12th international conference, Taormina, Italy, September 1–5, 2012, Proceedings, Part I, chap. Geometric Semantic Genetic Programming, pp 21–31. Springer Berlin

  21. Muñoz L, Silva S, Trujillo L (2015) M3GP: multiclass classification with GP. In: Machado P et al (eds) 18th European conference on genetic programming, LNCS, vol 9025. Springer, Copenhagen, pp 78–91

  22. Roy SS, Roy R, Balas VE (2018) Estimating heating load in buildings using multivariate adaptive regression splines, extreme learning machine, a hybrid model of MARS and ELM. Renew Sustain Energy Rev 82:4256–4268

    Article  Google Scholar 

  23. Silva S, Muñoz L, Trujillo L, Ingalalli V, Castelli M, Vanneschi L (2016) Genetic programming theory and practice XIII, chap. Multiclass classification through multidimensional clustering. Springer, Berlin

  24. Sipper M, Fu W, Ahuja K, Moore JH (2018) Investigating the parameter space of evolutionary algorithms. BioData Min 11(1):2

    Article  Google Scholar 

  25. Trujillo L, Muñoz L, Galván-López E, Silva S (2016) neat genetic programming: controlling bloat naturally. Inf Sci 333:21–43

    Article  Google Scholar 

  26. Tsanas A, Xifara A (2012) Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build 49:560–567

    Article  Google Scholar 

  27. Vladislavleva EJ, Smits GF, Den Hertog D (2009) Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. Trans Evol Comput 13(2):333–349

    Article  Google Scholar 

  28. Yeh IC (1998) Modeling of strength of high-performance concrete using artificial neural networks. Cem Concrete Res 28(12):1797–1808

    Article  Google Scholar 

  29. Z-Flores E, Trujillo L, Schütze O, Legrand P (2014) Evolve—a bridge between probability, set oriented numerics, and evolutionary computation V, chap. Evaluating the effects of local search in genetic programming. Springer, Cham, pp 213–228

Download references

Acknowledgements

First author supported by CONACYT (México) scholarship No. 401223. Research was funded by CONACYT Basic Science Research Project No. 178323, CONACYT Fronteras de la Ciencia FC-2015:2944, and FP7- Marie Curie-IRSES 2013 European Commission program with project ACoBSEC with contract No. 612689. Funding also provided by project PERSEIDS (PTDC/EMS-SIS/0642/2014) and BioISI RD unit, UID/MULTI/04046/2013, funded by FCT/MCTES/PIDDAC, Portugal, and TecNM project 6823.18-P, Mexico.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonardo Trujillo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Muñoz, L., Trujillo, L., Silva, S. et al. Evolving multidimensional transformations for symbolic regression with M3GP. Memetic Comp. 11, 111–126 (2019). https://doi.org/10.1007/s12293-018-0274-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12293-018-0274-5

Keywords

Navigation