A Comparison of Classification Strategies in Genetic Programming with Unbalanced Data

Bhowan, Urvesh; Zhang, Mengjie; Johnston, Mark

doi:10.1007/978-3-642-17432-2_25

Urvesh Bhowan²⁰,
Mengjie Zhang²⁰ &
Mark Johnston²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6464))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1840 Accesses
6 Citations

Abstract

Machine learning algorithms like Genetic Programming (GP) can evolve biased classifiers when data sets are unbalanced. In this paper we compare the effectiveness of two GP classification strategies. The first uses the standard (zero) class-threshold, while the second uses the “best” class-threshold determined dynamically on a solution-by-solution basis during evolution. These two strategies are evaluated using five different GP fitness across across a range of binary class imbalance problems, and the GP approaches are compared to other popular learning algorithms, namely, Naive Bayes and Support Vector Machines. Our results suggest that there is no overall difference between the two strategies, and that both strategies can evolve good solutions in binary classification when used in combination with an effective fitness function.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fawcett, T., Provost, F.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1, 291–316 (1997)
Article Google Scholar
Holmes, J.H.: Differential negative reinforcement improves classifier system learning rate in two-class problems with unequal base rates. In: Koza, J.R., Banzhaf, W., Chellapilla, K., et al. (eds.) Genetic Programming 1998: Proceedings of the Third Annual Conference, pp. 635–644. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Munder, S., Gavrila, D.: An experimental study on pedestrian classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1863–1868 (2006)
Article Google Scholar
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
MATH Google Scholar
Winkler, S., Affenzeller, M., Wagner, S.: Advanced genetic programming based machine learning. Journal of Mathematical Modelling and Algorithms 6(3), 455–480 (2007)
Article MathSciNet MATH Google Scholar
Doucette, J., Heywood, M.I.: GP classification under imbalanced data sets: Active sub-sampling and AUC approximation. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcázar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 266–277. Springer, Heidelberg (2008)
Chapter Google Scholar
Bhowan, U., Johnston, M., Zhang, M.: Genetic programming for image classification with unbalanced data (ivcnz). In: Proceedings of 24th International Conference on Image and Vision Computing, Wellington, New Zealand, pp. 316–321. IEEE Press, Los Alamitos (2009)
Google Scholar
Zhang, M., Smart, W.: Using Gaussian distribution to construct fitness functions in genetic programming for multiclass object classification. Pattern Recognition Letters 27(11), 1266–1274 (2006)
Article Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
Article Google Scholar
Yan, L., Dodier, R., Mozer, M.C., Wolniewicz, R.: Optimizing classifier performance via the Wilcoxon-Mann-Whitney statistic. In: Proceedings of The Twentieth International Conference on Machine Learning (ICML 2003), pp. 848–855 (2003)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1) (2009)
Google Scholar
Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html , University of California, Irvine, School of Information and Computer Sciences

Download references

Author information

Authors and Affiliations

School of Engineering and Computer Science, Victoria University of Wellington, New Zealand
Urvesh Bhowan & Mengjie Zhang
School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand
Mark Johnston

Authors

Urvesh Bhowan
View author publications
You can also search for this author in PubMed Google Scholar
Mengjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Mark Johnston
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer and Information Science, University of South Australia, 5095, Mawson Lakes, SA, Australia
Jiuyong Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhowan, U., Zhang, M., Johnston, M. (2010). A Comparison of Classification Strategies in Genetic Programming with Unbalanced Data. In: Li, J. (eds) AI 2010: Advances in Artificial Intelligence. AI 2010. Lecture Notes in Computer Science(), vol 6464. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17432-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-17432-2_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17431-5
Online ISBN: 978-3-642-17432-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics