Abstract
As symbolic regression (SR) has advanced into the early stages of commercial exploitation, the poor accuracy of SR still plagues even advanced commercial packages, and has become an issue for industrial users. Users expect a correct formula to be returned, especially in cases with zero noise and only one basis function with minimal complexity. At a minimum, users expect the response surface of the SR tool to be easily understood, so that the user can know a priori on what classes of problems to expect excellent, average, or poor accuracy. Poor or unknown accuracy is a hindrance to greater academic and industrial acceptance of SR tools. In several previous papers, we presented a complex algorithm for modern SR, which is extremely accurate for a large class of SR problems on noiseless data. Further research has shown that these extremely accurate SR algorithms also improve accuracy in noisy circumstances—albeit not extreme accuracy. Armed with these SR successes, we naively thought that achieving extreme accuracy applying GP to symbolic multi-class classification would be an easy goal. However, it seems algorithms having extreme accuracy in SR do not translate directly into symbolic multi-class classification. Furthermore, others have encountered serious issues applying GP to symbolic multi-class classification (Castelli et al. Applications of Evolutionary Computing, EvoApplications 2013: EvoCOMNET, EvoCOMPLEX, EvoENERGY, EvoFIN, EvoGAMES, EvoIASP, EvoINDUSTRY, EvoNUM, EvoPAR, EvoRISK, EvoROBOT, EvoSTOC, vol 7835, pp 334–343. Springer, Vienna, 2013). This is the first paper in a planned series developing the necessary algorithms for extreme accuracy in GP applied to symbolic multi-class classification. We develop an evolutionary algorithm for optimizing a single symbolic multi-class classification candidate. It is designed for big-data situations where the computational effort grows linearly as the number of features and training points increase. The algorithm’s behavior is demonstrated on theoretical problems, UCI benchmarks, and industry test cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Castelli, M., Silva, S., Vanneschi, L., Cabral, A., Vasconcelos, M.J., Catarino, L., Carreiras, J.M.B.: Land cover/land use multiclass classification using gp with geometric semantic operators. In: Esparcia-Alcazar, A.I., Cioppa, A.D., De Falco, I., Tarantino, E., Cotta, C., Schaefer, R., Diwold, K., Glette, K., Tettamanzi, A., Agapitos, A., Burrelli, P., Merelo, J.J., Cagnoni, S., Zhang, M., Urquhart, N., Sim, K., Ekart, A., Fernandez de Vega, F., Silva, S., Haasdijk, E., Eiben, G., Simoes, A., Rohlfshagen, P. (eds.) Applications of Evolutionary Computing, EvoApplications 2013: EvoCOMNET, EvoCOMPLEX, EvoENERGY, EvoFIN, EvoGAMES, EvoIASP, EvoINDUSTRY, EvoNUM, EvoPAR, EvoRISK, EvoROBOT, EvoSTOC. Lecture Notes in Computer Sscienc, vol. 7835, pp. 334–343. Springer, Vienna (2013). https://doi.org/10.1007/978-3-642-37192-9_34
Gandomi, A.H., Alavi, A.H., Ryan, C. (eds.): Handbook of Genetic Programming Applications. Springer, Berlin (2015). https://doi.org/10.1007/978-3-319-20883-1
Ingalalli, V., Silva, S., Castelli, M., Vanneschi, L.: A multi-dimensional genetic programming approach for multi-class classification problems. In: Nicolau, M., Krawiec, K., Heywood, M.I., Castelli, M., Garcia-Sanchez, P., Merelo, J.J., Rivas Santos, V.M., Sim, K. (eds.) 17th European Conference on Genetic Programming. Lecture Notes in Computer Science, vol. 8599, pp. 48–60. Springer, Granada (2014). https://doi.org/10.1007/978-3-662-44303-3_5
Karaboga, D., Akay, B.: A survey: algorithms simulating bee swarm intelligence. Artif. Intell. Rev. 31(1–4), 61–85 (2009)
Korns, M.F.: Abstract expression grammar symbolic regression. In: Riolo, R., McConaghy, T., Vladislavleva, E. (eds.) Genetic Programming Theory and Practice VIII. Genetic and Evolutionary Computation, vol. 8, chap. 7, pp. 109–128. Springer, Ann Arbor (2010). http://www.springer.com/computer/ai/book/978-1-4419-7746-5
Korns, M.F.: Accuracy in symbolic regression. In: Riolo, R., Vladislavleva, E., Moore, J.H. (eds.) Genetic Programming Theory and Practice IX, Genetic and Evolutionary Computation, chap. 8, pp. 129–151. Springer, Ann Arbor (2011). https://doi.org/10.1007/978-1-4614-1770-5_8
Korns, M.F.: A baseline symbolic regression algorithm. In: R. Riolo, E. Vladislavleva, M.D. Ritchie, J.H. Moore (eds.) Genetic Programming Theory and Practice X, Genetic and Evolutionary Computation, chap. 9, pp. 117–137. Springer, Ann Arbor (2012). https://doi.org/10.1007/978-1-4614-6846-2_9
Korns, M.F.: Extreme accuracy in symbolic regression. In: Riolo, R., Moore, J.H., Kotanchek, M. (eds.) Genetic Programming Theory and Practice XI, Genetic and Evolutionary Computation, chap. 1, pp. 1–30. Springer, Ann Arbor (2013). https://doi.org/10.1007/978-1-4939-0375-7_1
Korns, M.F.: Extremely accurate symbolic regression for large feature problems. In: Riolo, R., Worzel, W.P., Kotanchek, M. (eds.) Genetic Programming Theory and Practice XII, Genetic and Evolutionary Computation, pp. 109–131. Springer, Ann Arbor (2014). https://doi.org/10.1007/978-3-319-16030-6_7
Korns, M.: Highly accurate symbolic regression with noisy training data. In: Riolo, R., Worzel, W.P., Kotanchek, M., Kordon, A. (eds.) Genetic Programming Theory and Practice XIII, Genetic and Evolutionary Computation. Springer, Ann Arbor (2015). https://doi.org/10.1007/978-3-319-34223-8. http://www.springer.com/us/book/9783319342214
Kotanchek, M., Smits, G., Vladislavleva, E.: Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models. In: Riolo, R.L., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice V, Genetic and Evolutionary Computation, chap. 12, pp. 201–220. Springer, Ann Arbor (2007). https://doi.org/10.1007/978-0-387-76308-8_12. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.457.5272
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA (1992). http://mitpress.mit.edu/books/genetic-programming
Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge, MA (1994). http://www.genetic-programming.org/gpbook2toc.html
Koza, J.R., Andre, D., Bennett III, F.H., Keane, M.: Genetic Programming III: Darwinian Invention and Problem Solving. Morgan Kaufman (1999). http://www.genetic-programming.org/gpbook3toc.html
Langdon, W.B., Poli, R.: Foundations of Genetic Programming. Springer, Berlin (2002). https://doi.org/10.1007/978-3-662-04726-2. http://www.cs.ucl.ac.uk/staff/W.Langdon/FOGP/
McConaghy, T.: Ffx: Fast, scalable, deterministic symbolic regression technology. In: Riolo, R., Vladislavleva, E., Moore, J.H. (eds.) Genetic Programming Theory and Practice IX, Genetic and Evolutionary Computation, chap. 13, pp. 235–260. Springer, Ann Arbor (2011). https://doi.org/10.1007/978-1-4614-1770-5_13. http://trent.st/content/2011-GPTP-FFX-paper.pdf
Nelder, J., Wedderburn, R.: Generalized linear models. Stat. Soc 135, 370–383
Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report Microsoft Research Technical Report MSR-TR-98-14 (1998)
Poli, R., McPhee, N.F., Vanneschi, L.: Analysis of the effects of elitism on bloat in linear and tree-based genetic programming. In: Riolo, R.L., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice VI, Genetic and Evolutionary Computation, chap. 7, pp. 91–111. Springer, Ann Arbor (2008). https://doi.org/10.1007/978-0-387-87623-8_7
Smits, G., Kotanchek, M.: Pareto-front exploitation in symbolic regression. In: O’Reilly, U.M., Yu, T., Riolo, R.L., Worzel, B. (eds.) Genetic Programming Theory and Practice II, chap. 17, pp. 283–299. Springer, Ann Arbor (2004). https://doi.org/10.1007/0-387-23254-0_17
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Korns, M.F. (2018). An Evolutionary Algorithm for Big Data Multi-Class Classification Problems. In: Riolo, R., Worzel, B., Goldman, B., Tozier, B. (eds) Genetic Programming Theory and Practice XIV. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-319-97088-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-97088-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97087-5
Online ISBN: 978-3-319-97088-2
eBook Packages: Computer ScienceComputer Science (R0)