Skip to main content

A Logarithmic Distance-Based Multi-Objective Genetic Programming Approach for Classification of Imbalanced Data

  • Conference paper
  • First Online:
Book cover Advanced Computing (IACC 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1528))

Included in the following conference series:

  • 623 Accesses

Abstract

Standard classification algorithms give biased results when data sets are imbalanced. Genetic Programming, a machine learning algorithm based on the evolution of species in nature, also suffers from the same issue. In this research work, we introduced a logarithmic distance-based multi-objective genetic programming (MOGP) approach for classifying imbalanced data. The proposed approach utilizes the logarithmic value of the distance between predicted and expected values. This logarithmic value for the minority and the majority classes is treated as two separate objectives while learning. In the final generation, the proposed approach generated a Pareto-front of classifiers with a balanced surface representing the majority and the minority class accuracies for binary classification. The primary advantage of the MOGP technique is that it can produce a set of good-performing classifiers in a single experimental execution. Against the MOGP approach, the canonical GP method requires multiple experimental runs and a priori objective-based fitness function. Another benefit of MOGP is that it explicitly includes the learning bias into the algorithms. For evaluation of the proposed approach, we performed extensive experimentation of five imbalanced problems. The proposed approach’s results have proven its superiority over the traditional method, where the minority and majority class accuracies are taken as two separate objectives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Trans. Evol. Comput. 17(3), 368–386 (2013)

    Article  Google Scholar 

  2. Deb, K.: Multi-objective optimisation using evolutionary algorithms: an introduction. In: Multi-objective Evolutionary Optimisation for Product Design and Manufacturing, pp. 3–34. Springer, Cham (2011). https://doi.org/10.1007/978-0-85729-652-8_1

  3. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    Article  Google Scholar 

  4. Dhote, S., Vichoray, C., Pais, R., Baskar, S., Shakeel, P.M.: Hybrid geometric sampling and AdaBoost based deep learning approach for data imbalance in e-commerce. Electron. Commer. Res. 20(2), 259–274 (2020)

    Article  Google Scholar 

  5. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  6. Fernandes, E., de Leon Ferreira, A.C.P., Carvalho, D., Yao, X.: Ensemble of classifiers based on multiobjective genetic sampling for imbalanced data. IEEE Trans. Knowl. Data Eng. 32, 1104–1115 (2019)

    Article  Google Scholar 

  7. Galván-López, E., Vázquez-Mendoza, L., Trujillo, L.: Stochastic semantic-based multi-objective genetic programming optimisation for classification of imbalanced data. In: Pichardo-Lagunas, O., Miranda-Jiménez, S. (eds.) MICAI 2016. LNCS (LNAI), vol. 10062, pp. 261–272. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62428-0_22

    Chapter  Google Scholar 

  8. Guan, D., Yuan, W.: A survey of mislabeled training data detection techniques for pattern classification. IETE Tech. Rev. 30(6), 524–530 (2013)

    Article  Google Scholar 

  9. Horton, P., Nakai, K.: A probabilistic classification system for predicting the cellular localization sites of proteins. In: ISMB, vol. 4, pp. 109–115 (1996)

    Google Scholar 

  10. Huang, S., Lei, K.: IGAN-IDS: an imbalanced generative adversarial network towards intrusion detection system in ad-hoc networks. Ad Hoc Netw. 105, 102177 (2020)

    Article  Google Scholar 

  11. Jemai, J., Zekri, M., Mellouli, K.: An NSGA-II algorithm for the green vehicle routing problem. In: Hao, J.-K., Middendorf, M. (eds.) EvoCOP 2012. LNCS, vol. 7245, pp. 37–48. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29124-1_4

    Chapter  Google Scholar 

  12. Kang, Q., Shi, L., Zhou, M., Wang, X., Wu, Q., Wei, Z.: A distance-based weighted undersampling scheme for support vector machines and its application to imbalanced classification. IEEE Trans. Neural Netw. Learn. Syst. 29(9), 4152–4165 (2017)

    Article  Google Scholar 

  13. Koza, J.: Genetic Programming: On the Programming of Computers by Means of Natural Selection (1992)

    Google Scholar 

  14. Kumar, A., Sinha, N., Bhardwaj, A.: A novel fitness function in genetic programming for medical data classification. J. Biomed. Inform. 112, 103623 (2020)

    Article  Google Scholar 

  15. Kumar, A., Sinha, N., Bhardwaj, A.: Predicting the presence of newt-amphibian using genetic programming. In: Advances in Data and Information Sciences, vol. 318, pp. 1–10. Springer, Cham (2021). https://doi.org/10.1007/978-981-16-5689-7_19

  16. Kumar, A., Sinha, N., Bhardwaj, A., Goel, S.: Clinical risk assessment of chronic kidney disease patients using genetic programming. Comput. Meth. Biomech. Biomed. Eng. 1–9 (2021). https://doi.org/10.1080/10255842.2021.1985476

  17. Lee, D., Kim, K.: An efficient method to determine sample size in oversampling based on classification complexity for imbalanced data. Expert Syst. Appl. 184, 115442 (2021)

    Article  Google Scholar 

  18. Li, J., Fong, S., Wong, R.K., Chu, V.W.: Adaptive multi-objective swarm fusion for imbalanced data classification. Inf. Fusion 39, 1–24 (2018)

    Article  Google Scholar 

  19. Li, Y., Wang, S., Duan, X., Liu, S., Liu, J., Hu, S.: Multi-objective energy management for Atkinson cycle engine and series hybrid electric vehicle based on evolutionary NSGA-II algorithm using digital twins. Energy Convers. Manage. 230, 113788 (2021)

    Article  Google Scholar 

  20. Liu, B., Tsoumakas, G.: Dealing with class imbalance in classifier chains via random undersampling. Knowl.-Based Syst. 192, 105292 (2020)

    Article  Google Scholar 

  21. Nash, W.J., Sellers, T.L., Talbot, S.R., Cawthorn, A.J., Ford, W.B.: The population biology of abalone (Haliotis species) in Tasmania. I. Blacklip abalone (H. rubra) from the north coast and islands of bass strait. Sea Fisheries Division, Technical Report 48, p. 411 (1994)

    Google Scholar 

  22. Rahman, Q.I., Schmeisser, G.: Characterization of the speed of convergence of the trapezoidal rule. Numer. Math. 57(1), 123–138 (1990)

    Article  MathSciNet  Google Scholar 

  23. Sigillito, V.G., Wing, S.P., Hutton, L.V., Baker, K.B.: Classification of radar returns from the ionosphere using neural networks. J. Hopkins APL Tech. Dig. 10(3), 262–266 (1989)

    Google Scholar 

  24. Street, W.N., Wolberg, W.H., Mangasarian, O.L.: Nuclear feature extraction for breast tumor diagnosis. In: Biomedical Image Processing and Biomedical Visualization, vol. 1905, pp. 861–870. International Society for Optics and Photonics (1993)

    Google Scholar 

  25. Wang, P., Emmerich, M., Li, R., Tang, K., Bäck, T., Yao, X.: Convex hull-based multiobjective genetic programming for maximizing receiver operating characteristic performance. IEEE Trans. Evol. Comput. 19(2), 188–200 (2014)

    Article  Google Scholar 

  26. Wang, S., Zhao, D., Yuan, J., Li, H., Gao, Y.: Application of NSGA-II algorithm for fault diagnosis in power system. Electr. Power Syst. Res. 175, 105893 (2019)

    Article  Google Scholar 

  27. Xu, X., Fu, S., Li, W., Dai, F., Gao, H., Chang, V.: Multi-objective data placement for workflow management in cloud infrastructure using NSGA-II. IEEE Trans. Emerg. Top. Comput. Intell. 4(5), 605–615 (2020)

    Article  Google Scholar 

  28. Yusoff, Y., Ngadiman, M.S., Zain, A.M.: Overview of NSGA-II for optimizing machining process parameters. Procedia Eng. 15, 3978–3983 (2011)

    Article  Google Scholar 

  29. Zhang, C., Tan, K.C., Li, H., Hong, G.S.: A cost-sensitive deep belief network for imbalanced classification. IEEE Trans. Neural Netw. Learn. Syst. 30(1), 109–122 (2018)

    Article  Google Scholar 

  30. Zhao, B., Xue, Y., Xu, B., Ma, T., Liu, J.: Multi-objective classification based on NSGA-II. Int. J. Comput. Sci. Math. 9(6), 539–546 (2018)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arvind Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumar, A., Goel, S., Sinha, N., Bhardwaj, A. (2022). A Logarithmic Distance-Based Multi-Objective Genetic Programming Approach for Classification of Imbalanced Data. In: Garg, D., Jagannathan, S., Gupta, A., Garg, L., Gupta, S. (eds) Advanced Computing. IACC 2021. Communications in Computer and Information Science, vol 1528. Springer, Cham. https://doi.org/10.1007/978-3-030-95502-1_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95502-1_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95501-4

  • Online ISBN: 978-3-030-95502-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics