Abstract
Data sets with imbalanced class distribution pose serious challenges to well-established classifiers. In this work, we propose a stochastic multi-objective genetic programming based on semantics. We tested this approach on imbalanced binary classification data sets, where the proposed approach is able to achieve, in some cases, higher recall, precision and F-measure values on the minority class compared to C4.5, Naive Bayes and Support Vector Machine, without significantly decreasing these values on the majority class.
E. Galván-López—Research conducted during Galván’s stay at TAO, INRIA and LRI, CNRS & U. Paris-Sud, Université Paris-Saclay, France.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For the abalone data sets, we substituted F, M and I by 1, 2 and 3, respectively.
References
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Reusing genetic programming for ensemble selection in classification of unbalanced data. IEEE Trans. Evol. Comput. 18(6), 893–908 (2014)
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Coello, C.A.C.: Evolutionary multi-objective optimization: a historical view of the field. IEEE Comput. Intell. Mag. 1(1), 28–36 (2006)
Deb, K., Kalyanmoy, D.: Multi-Objective Optimization Using Evolutionary Algorithms. Wiley, New York (2001)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002)
Eiben, A.E., Smith, J.: From evolutionary computation to the evolution of things. Nature 521, 476–482 (2015)
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Berlin (2003). doi:10.1007/978-3-662-05094-1
Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms, 1st edn. Springer, Berlin (2002). doi:10.1007/978-3-662-04923-5
Galván-López, E.: Efficient graph-based genetic programming representation with multiple outputs. Int. J. Autom. Comput. 5(1), 81–89 (2008)
Galván-López, E., Cody-Kenny, B., Trujillo, L., Kattan, A.: Using semantics in the selection mechanism in genetic programming: a simple method for promoting semantic diversity. In: 2013 IEEE Congress on Evolutionary Computation, pp. 2972–2979, June 2013
Galván-López, E., Fagan, D., Murphy, E., Swafford, J., Agapitos, A., O’Neill, M., Brabazon, A.: Comparing the performance of the evolvable \(\pi \) grammatical evolution genotype-phenotype map to grammatical evolution in the dynamic Ms. Pac-Man environment. In: 2010 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8, July 2010
Galván-López, E., McDermott, J., O’Neill, M., Brabazon, A.: Defining locality in genetic programming to predict performance. In: IEEE Congress on Evolutionary Computation, pp. 1–8. IEEE (2010)
Galván-López, E., McDermott, J., O’Neill, M., Brabazon, A.: Towards an understanding of locality in genetic programming. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO 2010, NY, USA, pp. 901–908. ACM (2010)
Galván-López, E., Mezura-Montes, E., Ait ElHara, O., Schoenauer, M.: On the use of semantics in multi-objective genetic programming. In: Handl, J., Hart, E., Lewis, P.R., López-Ibáñez, M., Ochoa, G., Paechter, B. (eds.) PPSN 2016. LNCS, vol. 9921, pp. 353–363. Springer, Cham (2016). doi:10.1007/978-3-319-45823-6_33
Galván-López, E., Poli, R.: Some steps towards understanding how neutrality affects evolutionary search. In: Runarsson, T.P., Beyer, H.-G., Burke, E., Merelo-Guervós, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 778–787. Springer, Heidelberg (2006). doi:10.1007/11844297_79
López, E.G., Poli, R., Coello, C.A.C.: Reusing code in genetic programming. In: Keijzer, M., O’Reilly, U.-M., Lucas, S., Costa, E., Soule, T. (eds.) EuroGP 2004. LNCS, vol. 3003, pp. 359–368. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24650-3_34
Galván-López, E., Poli, R., Kattan, A., O’Neill, M., Brabazon, A.: Neutrality in evolutionary algorithms.. What do we know? Evol. Syst. 2(3), 145–163 (2011)
Galván-López, E., Swafford, J.M., O’Neill, M., Brabazon, A.: Evolving a Ms. PacMan controller using grammatical evolution. In: Di Chio, C., et al. (eds.) EvoApplications 2010. LNCS, vol. 6024, pp. 161–170. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12239-2_17
Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: 2008 Fourth International Conference on Natural Computation, vol. 4, pp. 192–201, October 2008
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press, Cambridge (1992)
Koza, J.R.: Human-competitive results produced by genetic programming. Genet. Program. Evolvable Mach. 11(3–4), 251–284 (2010)
Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2), 195–215 (1998)
Poli, R., Galván-López, E.: The effects of constant and bit-wise neutrality on problem hardness, fitness distance correlation and phenotypic mutation rates. IEEE Trans. Evol. Comput. 16(2), 279–300 (2012)
Uy, N.Q., Hoai, N.X., O’Neill, M., McKay, R.I., Galván-López, E.: Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evolvable Mach. 12(2), 91–119 (2011)
Vanneschi, L., Castelli, M., Silva, S.: A survey of semantic methods in genetic programming. Genet. Program. Evolvable Mach. 15(2), 195–214 (2014)
Weiss, G.M., Provost, F.: Learning when training data are costly: The effect of class distribution on tree induction. J. Artif. Int. Res. 19(1), 315–354 (2003)
Acknowledgements
EGL’s research is funded by an ELEVATE Fellowship, the Irish Research Council’s Career Development Fellowship co-funded by Marie Curie Actions. EGL would like to thank the TAO group at INRIA Saclay France for hosting him during the outgoing phase of the fellowship. LVM thanks the SSSP for hosting her during her research visit at TCD. The authors would like to thank the reviewers for their comments that helped us to improve our work. EGL would also like to thank E. Mezura-Montes, O. Ait Elhara and M. Schoenauer for their earlier involvement in this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Galván-López, E., Vázquez-Mendoza, L., Trujillo, L. (2017). Stochastic Semantic-Based Multi-objective Genetic Programming Optimisation for Classification of Imbalanced Data. In: Pichardo-Lagunas, O., Miranda-Jiménez, S. (eds) Advances in Soft Computing. MICAI 2016. Lecture Notes in Computer Science(), vol 10062. Springer, Cham. https://doi.org/10.1007/978-3-319-62428-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-62428-0_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62427-3
Online ISBN: 978-3-319-62428-0
eBook Packages: Computer ScienceComputer Science (R0)