Abstract
GPTIPS is a free, open source MATLAB based software platform for symbolic data mining (SDM). It uses a multigene variant of the biologically inspired machine learning method of genetic programming (MGGP) as the engine that drives the automatic model discovery process. Symbolic data mining is the process of extracting hidden, meaningful relationships from data in the form of symbolic equations. In contrast to other data-mining methods, the structural transparency of the generated predictive equations can give new insights into the physical systems or processes that generated the data. Furthermore, this transparency makes the models very easy to deploy outside of MATLAB.
The rationale behind GPTIPS is to reduce the technical barriers to using, understanding, visualising and deploying GP based symbolic models of data, whilst at the same time remaining highly customisable and delivering robust numerical performance for power users. In this chapter, notable new features of the latest version of the software—GPTIPS 2—are discussed with these aims in mind. Additionally, a simplified variant of the MGGP high level gene crossover mechanism is proposed.
It is demonstrated that the new functionality of GPTIPS 2 (a) facilitates the discovery of compact symbolic relationships from data using multiple approaches, e.g. using novel gene-centric visualisation analysis to mitigate horizontal bloat and reduce complexity in multigene symbolic regression models (b) provides numerous methods for visualising the properties of symbolic models (c) emphasises the generation of graphically navigable libraries of models that are optimal in terms of the Pareto trade off surface of model performance and complexity and (d) expedites real world applications by the simple, rapid and robust deployment of symbolic models outside the software environment they were developed in.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
A list of research literature using GPTIPS is maintained at https://sites.google.com/site/gptips4matlab/application-areas.
- 2.
Currently, the Pareto tournament implementation does not support more than two objectives.
- 3.
Although RMSE is the default fitness measure, this can be easily changed to, for example, MSE by a very minor edit to the file containing the default fitness function.
References
Koza J.R. (1992) Genetic programming: on the programming of computers by means of natural selection, The MIT Press, Cambridge (MA).
Espejo, P.G., Ventura, S., Herrera, F. (2010) A survey on the application of genetic programming to classification, IEEE Transactions on Systems, Man and Cybernetics - Part C: Applications and Reviews, 40 (2), 121–144.
Morrison, G., Searson, D., Willis, M. (2010) Using genetic programming to evolve a team of data classifiers. World Academy of Science, Engineering and Technology, International Science Index 48, 4(12), 210–213.
Pan, I., Das, S. (2014) When Darwin meets Lorenz: Evolving new chaotic attractors through genetic programming. arXiv preprint arXiv:1409.7842.
Gandomi, A.H., Alavi, A.H. (2011) A new multi-gene genetic programming approach to non-linear system modeling. Part II: geotechnical and earthquake engineering problems, Neural Comput & Applic, 21(1), 171–187.
Smits, G.F., Kotanchek, M. (2004) Pareto-front exploitation in symbolic regression, Genetic Programming Theory and Practice II, 283–299.
Poli, R., Langdon, W.B., McPhee, N.F., Koza, J.R. (2007). Genetic programming: An introductory tutorial and a survey of techniques and applications. University of Essex, UK, Tech. Rep. CES-475.
Pan, I., Pandey, D.S., Das, S. (2013) Global solar irradiation prediction using a multi-gene genetic programming approach. Journal of Renewable and Sustainable Energy, 5(6), 063129.
Barati, R., Neyshabouri, S.A.A.S., Ahmadi, G. (2014) Development of empirical models with high accuracy for estimation of drag coefficient of flow around a smooth sphere: An evolutionary approach. Powder Technology, 257, 11–19.
Floares, A.G., Luludachi, I. (2014) Inferring transcription networks from data. Springer Handbook of Bio-/Neuroinformatics, Springer Berlin Heidelberg, 311–326.
Gandomi, A.H., Alavi, A.H. (2012) A new multi-gene genetic programming approach to nonlinear system modeling. Part I: materials and structural engineering problems. Neural Computing and Applications, 21(1), 171–187.
Searson, D.P. (2002) Non-linear PLS using genetic programming, PhD thesis, Newcastle University, UK.
Searson D.P., Willis M.J., Montague, G.A. (2007) Co-evolution of non-linear PLS model components, Journal of Chemometrics, 21 (12), 592–603.
Searson, D.P., Leahy, D.E., Willis, M.J. (2010) GPTIPS: an open source genetic programming toolbox for multigene symbolic regression, Proceedings of the International MultiConference of Engineers and Computer Scientists 2010 (IMECS 2010), Hong Kong, 17–19 March.
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.A.M.T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. Evolutionary Computation, IEEE Transactions on, 6(2), 182–197.
Bi, J., Bennett, K.P. (2003) Regression error characteristic curves, Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 43–50.
Keijzer, M. (2004) Scaled symbolic regression, Genetic Programming and Evolvable Machines, 5, 259–269.
Storn, R., Price, K. (1997) Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization, 11(4), 341–359.
Luke, S., Panait, L. (2006) A comparison of bloat control methods for genetic programming, Evol. Comput., 14(3), 309–344.
Hoerl, A. E., Kennard, R.W. (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.
Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Searson, D.P. (2015). GPTIPS 2: An Open-Source Software Platform for Symbolic Data Mining. In: Gandomi, A., Alavi, A., Ryan, C. (eds) Handbook of Genetic Programming Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-20883-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-20883-1_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20882-4
Online ISBN: 978-3-319-20883-1
eBook Packages: Computer ScienceComputer Science (R0)