Skip to main content

Stochastic Semantic-Based Multi-objective Genetic Programming Optimisation for Classification of Imbalanced Data

  • Conference paper
  • First Online:
Book cover Advances in Soft Computing (MICAI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10062))

Included in the following conference series:

Abstract

Data sets with imbalanced class distribution pose serious challenges to well-established classifiers. In this work, we propose a stochastic multi-objective genetic programming based on semantics. We tested this approach on imbalanced binary classification data sets, where the proposed approach is able to achieve, in some cases, higher recall, precision and F-measure values on the minority class compared to C4.5, Naive Bayes and Support Vector Machine, without significantly decreasing these values on the majority class.

E. Galván-López—Research conducted during Galván’s stay at TAO, INRIA and LRI, CNRS & U. Paris-Sud, Université Paris-Saclay, France.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For the abalone data sets, we substituted F, M and I by 1, 2 and 3, respectively.

References

  1. Asuncion, A., Newman, D.: UCI machine learning repository (2007)

    Google Scholar 

  2. Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Reusing genetic programming for ensemble selection in classification of unbalanced data. IEEE Trans. Evol. Comput. 18(6), 893–908 (2014)

    Article  Google Scholar 

  3. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)

    Article  Google Scholar 

  4. Coello, C.A.C.: Evolutionary multi-objective optimization: a historical view of the field. IEEE Comput. Intell. Mag. 1(1), 28–36 (2006)

    Article  MathSciNet  Google Scholar 

  5. Deb, K., Kalyanmoy, D.: Multi-Objective Optimization Using Evolutionary Algorithms. Wiley, New York (2001)

    MATH  Google Scholar 

  6. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002)

    Article  Google Scholar 

  7. Eiben, A.E., Smith, J.: From evolutionary computation to the evolution of things. Nature 521, 476–482 (2015)

    Article  Google Scholar 

  8. Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Berlin (2003). doi:10.1007/978-3-662-05094-1

    Book  MATH  Google Scholar 

  9. Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms, 1st edn. Springer, Berlin (2002). doi:10.1007/978-3-662-04923-5

    Book  MATH  Google Scholar 

  10. Galván-López, E.: Efficient graph-based genetic programming representation with multiple outputs. Int. J. Autom. Comput. 5(1), 81–89 (2008)

    Article  Google Scholar 

  11. Galván-López, E., Cody-Kenny, B., Trujillo, L., Kattan, A.: Using semantics in the selection mechanism in genetic programming: a simple method for promoting semantic diversity. In: 2013 IEEE Congress on Evolutionary Computation, pp. 2972–2979, June 2013

    Google Scholar 

  12. Galván-López, E., Fagan, D., Murphy, E., Swafford, J., Agapitos, A., O’Neill, M., Brabazon, A.: Comparing the performance of the evolvable \(\pi \) grammatical evolution genotype-phenotype map to grammatical evolution in the dynamic Ms. Pac-Man environment. In: 2010 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8, July 2010

    Google Scholar 

  13. Galván-López, E., McDermott, J., O’Neill, M., Brabazon, A.: Defining locality in genetic programming to predict performance. In: IEEE Congress on Evolutionary Computation, pp. 1–8. IEEE (2010)

    Google Scholar 

  14. Galván-López, E., McDermott, J., O’Neill, M., Brabazon, A.: Towards an understanding of locality in genetic programming. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO 2010, NY, USA, pp. 901–908. ACM (2010)

    Google Scholar 

  15. Galván-López, E., Mezura-Montes, E., Ait ElHara, O., Schoenauer, M.: On the use of semantics in multi-objective genetic programming. In: Handl, J., Hart, E., Lewis, P.R., López-Ibáñez, M., Ochoa, G., Paechter, B. (eds.) PPSN 2016. LNCS, vol. 9921, pp. 353–363. Springer, Cham (2016). doi:10.1007/978-3-319-45823-6_33

    Chapter  Google Scholar 

  16. Galván-López, E., Poli, R.: Some steps towards understanding how neutrality affects evolutionary search. In: Runarsson, T.P., Beyer, H.-G., Burke, E., Merelo-Guervós, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 778–787. Springer, Heidelberg (2006). doi:10.1007/11844297_79

    Chapter  Google Scholar 

  17. López, E.G., Poli, R., Coello, C.A.C.: Reusing code in genetic programming. In: Keijzer, M., O’Reilly, U.-M., Lucas, S., Costa, E., Soule, T. (eds.) EuroGP 2004. LNCS, vol. 3003, pp. 359–368. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24650-3_34

    Chapter  Google Scholar 

  18. Galván-López, E., Poli, R., Kattan, A., O’Neill, M., Brabazon, A.: Neutrality in evolutionary algorithms.. What do we know? Evol. Syst. 2(3), 145–163 (2011)

    Article  Google Scholar 

  19. Galván-López, E., Swafford, J.M., O’Neill, M., Brabazon, A.: Evolving a Ms. PacMan controller using grammatical evolution. In: Di Chio, C., et al. (eds.) EvoApplications 2010. LNCS, vol. 6024, pp. 161–170. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12239-2_17

    Chapter  Google Scholar 

  20. Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201, October 2008

    Google Scholar 

  21. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  22. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  23. Koza, J.R.: Human-competitive results produced by genetic programming. Genet. Program. Evolvable Mach. 11(3–4), 251–284 (2010)

    Article  Google Scholar 

  24. Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2), 195–215 (1998)

    Article  Google Scholar 

  25. Poli, R., Galván-López, E.: The effects of constant and bit-wise neutrality on problem hardness, fitness distance correlation and phenotypic mutation rates. IEEE Trans. Evol. Comput. 16(2), 279–300 (2012)

    Article  MATH  Google Scholar 

  26. Uy, N.Q., Hoai, N.X., O’Neill, M., McKay, R.I., Galván-López, E.: Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evolvable Mach. 12(2), 91–119 (2011)

    Article  Google Scholar 

  27. Vanneschi, L., Castelli, M., Silva, S.: A survey of semantic methods in genetic programming. Genet. Program. Evolvable Mach. 15(2), 195–214 (2014)

    Article  Google Scholar 

  28. Weiss, G.M., Provost, F.: Learning when training data are costly: The effect of class distribution on tree induction. J. Artif. Int. Res. 19(1), 315–354 (2003)

    MATH  Google Scholar 

Download references

Acknowledgements

EGL’s research is funded by an ELEVATE Fellowship, the Irish Research Council’s Career Development Fellowship co-funded by Marie Curie Actions. EGL would like to thank the TAO group at INRIA Saclay France for hosting him during the outgoing phase of the fellowship. LVM thanks the SSSP for hosting her during her research visit at TCD. The authors would like to thank the reviewers for their comments that helped us to improve our work. EGL would also like to thank E. Mezura-Montes, O. Ait Elhara and M. Schoenauer for their earlier involvement in this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edgar Galván-López .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Galván-López, E., Vázquez-Mendoza, L., Trujillo, L. (2017). Stochastic Semantic-Based Multi-objective Genetic Programming Optimisation for Classification of Imbalanced Data. In: Pichardo-Lagunas, O., Miranda-Jiménez, S. (eds) Advances in Soft Computing. MICAI 2016. Lecture Notes in Computer Science(), vol 10062. Springer, Cham. https://doi.org/10.1007/978-3-319-62428-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62428-0_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62427-3

  • Online ISBN: 978-3-319-62428-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics