Skip to main content

An Evolutionary Algorithm for Big Data Multi-Class Classification Problems

  • Chapter
  • First Online:
Book cover Genetic Programming Theory and Practice XIV

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

As symbolic regression (SR) has advanced into the early stages of commercial exploitation, the poor accuracy of SR still plagues even advanced commercial packages, and has become an issue for industrial users. Users expect a correct formula to be returned, especially in cases with zero noise and only one basis function with minimal complexity. At a minimum, users expect the response surface of the SR tool to be easily understood, so that the user can know a priori on what classes of problems to expect excellent, average, or poor accuracy. Poor or unknown accuracy is a hindrance to greater academic and industrial acceptance of SR tools. In several previous papers, we presented a complex algorithm for modern SR, which is extremely accurate for a large class of SR problems on noiseless data. Further research has shown that these extremely accurate SR algorithms also improve accuracy in noisy circumstances—albeit not extreme accuracy. Armed with these SR successes, we naively thought that achieving extreme accuracy applying GP to symbolic multi-class classification would be an easy goal. However, it seems algorithms having extreme accuracy in SR do not translate directly into symbolic multi-class classification. Furthermore, others have encountered serious issues applying GP to symbolic multi-class classification (Castelli et al. Applications of Evolutionary Computing, EvoApplications 2013: EvoCOMNET, EvoCOMPLEX, EvoENERGY, EvoFIN, EvoGAMES, EvoIASP, EvoINDUSTRY, EvoNUM, EvoPAR, EvoRISK, EvoROBOT, EvoSTOC, vol 7835, pp 334–343. Springer, Vienna, 2013). This is the first paper in a planned series developing the necessary algorithms for extreme accuracy in GP applied to symbolic multi-class classification. We develop an evolutionary algorithm for optimizing a single symbolic multi-class classification candidate. It is designed for big-data situations where the computational effort grows linearly as the number of features and training points increase. The algorithm’s behavior is demonstrated on theoretical problems, UCI benchmarks, and industry test cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Castelli, M., Silva, S., Vanneschi, L., Cabral, A., Vasconcelos, M.J., Catarino, L., Carreiras, J.M.B.: Land cover/land use multiclass classification using gp with geometric semantic operators. In: Esparcia-Alcazar, A.I., Cioppa, A.D., De Falco, I., Tarantino, E., Cotta, C., Schaefer, R., Diwold, K., Glette, K., Tettamanzi, A., Agapitos, A., Burrelli, P., Merelo, J.J., Cagnoni, S., Zhang, M., Urquhart, N., Sim, K., Ekart, A., Fernandez de Vega, F., Silva, S., Haasdijk, E., Eiben, G., Simoes, A., Rohlfshagen, P. (eds.) Applications of Evolutionary Computing, EvoApplications 2013: EvoCOMNET, EvoCOMPLEX, EvoENERGY, EvoFIN, EvoGAMES, EvoIASP, EvoINDUSTRY, EvoNUM, EvoPAR, EvoRISK, EvoROBOT, EvoSTOC. Lecture Notes in Computer Sscienc, vol. 7835, pp. 334–343. Springer, Vienna (2013). https://doi.org/10.1007/978-3-642-37192-9_34

    Chapter  Google Scholar 

  2. Gandomi, A.H., Alavi, A.H., Ryan, C. (eds.): Handbook of Genetic Programming Applications. Springer, Berlin (2015). https://doi.org/10.1007/978-3-319-20883-1

    Google Scholar 

  3. Ingalalli, V., Silva, S., Castelli, M., Vanneschi, L.: A multi-dimensional genetic programming approach for multi-class classification problems. In: Nicolau, M., Krawiec, K., Heywood, M.I., Castelli, M., Garcia-Sanchez, P., Merelo, J.J., Rivas Santos, V.M., Sim, K. (eds.) 17th European Conference on Genetic Programming. Lecture Notes in Computer Science, vol. 8599, pp. 48–60. Springer, Granada (2014). https://doi.org/10.1007/978-3-662-44303-3_5

    Google Scholar 

  4. Karaboga, D., Akay, B.: A survey: algorithms simulating bee swarm intelligence. Artif. Intell. Rev. 31(1–4), 61–85 (2009)

    Article  Google Scholar 

  5. Korns, M.F.: Abstract expression grammar symbolic regression. In: Riolo, R., McConaghy, T., Vladislavleva, E. (eds.) Genetic Programming Theory and Practice VIII. Genetic and Evolutionary Computation, vol. 8, chap. 7, pp. 109–128. Springer, Ann Arbor (2010). http://www.springer.com/computer/ai/book/978-1-4419-7746-5

    Google Scholar 

  6. Korns, M.F.: Accuracy in symbolic regression. In: Riolo, R., Vladislavleva, E., Moore, J.H. (eds.) Genetic Programming Theory and Practice IX, Genetic and Evolutionary Computation, chap. 8, pp. 129–151. Springer, Ann Arbor (2011). https://doi.org/10.1007/978-1-4614-1770-5_8

    Google Scholar 

  7. Korns, M.F.: A baseline symbolic regression algorithm. In: R. Riolo, E. Vladislavleva, M.D. Ritchie, J.H. Moore (eds.) Genetic Programming Theory and Practice X, Genetic and Evolutionary Computation, chap. 9, pp. 117–137. Springer, Ann Arbor (2012). https://doi.org/10.1007/978-1-4614-6846-2_9

    Google Scholar 

  8. Korns, M.F.: Extreme accuracy in symbolic regression. In: Riolo, R., Moore, J.H., Kotanchek, M. (eds.) Genetic Programming Theory and Practice XI, Genetic and Evolutionary Computation, chap. 1, pp. 1–30. Springer, Ann Arbor (2013). https://doi.org/10.1007/978-1-4939-0375-7_1

    Google Scholar 

  9. Korns, M.F.: Extremely accurate symbolic regression for large feature problems. In: Riolo, R., Worzel, W.P., Kotanchek, M. (eds.) Genetic Programming Theory and Practice XII, Genetic and Evolutionary Computation, pp. 109–131. Springer, Ann Arbor (2014). https://doi.org/10.1007/978-3-319-16030-6_7

    Google Scholar 

  10. Korns, M.: Highly accurate symbolic regression with noisy training data. In: Riolo, R., Worzel, W.P., Kotanchek, M., Kordon, A. (eds.) Genetic Programming Theory and Practice XIII, Genetic and Evolutionary Computation. Springer, Ann Arbor (2015). https://doi.org/10.1007/978-3-319-34223-8. http://www.springer.com/us/book/9783319342214

    MATH  Google Scholar 

  11. Kotanchek, M., Smits, G., Vladislavleva, E.: Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models. In: Riolo, R.L., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice V, Genetic and Evolutionary Computation, chap. 12, pp. 201–220. Springer, Ann Arbor (2007). https://doi.org/10.1007/978-0-387-76308-8_12. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.457.5272

  12. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA (1992). http://mitpress.mit.edu/books/genetic-programming

    MATH  Google Scholar 

  13. Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge, MA (1994). http://www.genetic-programming.org/gpbook2toc.html

    MATH  Google Scholar 

  14. Koza, J.R., Andre, D., Bennett III, F.H., Keane, M.: Genetic Programming III: Darwinian Invention and Problem Solving. Morgan Kaufman (1999). http://www.genetic-programming.org/gpbook3toc.html

  15. Langdon, W.B., Poli, R.: Foundations of Genetic Programming. Springer, Berlin (2002). https://doi.org/10.1007/978-3-662-04726-2. http://www.cs.ucl.ac.uk/staff/W.Langdon/FOGP/

    Book  Google Scholar 

  16. McConaghy, T.: Ffx: Fast, scalable, deterministic symbolic regression technology. In: Riolo, R., Vladislavleva, E., Moore, J.H. (eds.) Genetic Programming Theory and Practice IX, Genetic and Evolutionary Computation, chap. 13, pp. 235–260. Springer, Ann Arbor (2011). https://doi.org/10.1007/978-1-4614-1770-5_13. http://trent.st/content/2011-GPTP-FFX-paper.pdf

    Google Scholar 

  17. Nelder, J., Wedderburn, R.: Generalized linear models. Stat. Soc 135, 370–383

    Article  Google Scholar 

  18. Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report Microsoft Research Technical Report MSR-TR-98-14 (1998)

    Google Scholar 

  19. Poli, R., McPhee, N.F., Vanneschi, L.: Analysis of the effects of elitism on bloat in linear and tree-based genetic programming. In: Riolo, R.L., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice VI, Genetic and Evolutionary Computation, chap. 7, pp. 91–111. Springer, Ann Arbor (2008). https://doi.org/10.1007/978-0-387-87623-8_7

    Google Scholar 

  20. Smits, G., Kotanchek, M.: Pareto-front exploitation in symbolic regression. In: O’Reilly, U.M., Yu, T., Riolo, R.L., Worzel, B. (eds.) Genetic Programming Theory and Practice II, chap. 17, pp. 283–299. Springer, Ann Arbor (2004). https://doi.org/10.1007/0-387-23254-0_17

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Korns, M.F. (2018). An Evolutionary Algorithm for Big Data Multi-Class Classification Problems. In: Riolo, R., Worzel, B., Goldman, B., Tozier, B. (eds) Genetic Programming Theory and Practice XIV. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-319-97088-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-97088-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-97087-5

  • Online ISBN: 978-3-319-97088-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics