Abstract
Machine learning algorithms can suffer a performance bias when data sets are unbalanced. This paper develops a multi-objective genetic programming approach to evolving accurate and diverse ensembles of non-dominated solutions where members vote on class membership. We explore why the ensembles can also be vulnerable to the learning bias using a range of unbalanced data sets. Based on the notion that smaller ensembles can be better than larger ensembles, we develop a new evolutionary-based pruning method to find groups of highly-cooperative individuals that can improve accuracy on the important minority class.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Weiss, G.M., Provost, F.: Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research 19, 315–354 (2003)
Chawla, N.V., Sylvester, J.: Exploiting Diversity in Ensembles: Improving the Performance on Unbalanced Datasets. In: Haindl, M., Kittler, J., Roli, F. (eds.) MCS 2007. LNCS, vol. 4472, pp. 397–406. Springer, Heidelberg (2007)
Mclntyre, A., Heywood, M.: Multi-objective competitive coevolution for efficient GP classifier problem decomposition. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 1930–1937 (2007)
Wang, S., Tang, K., Yao, X.: Diversity exploration and negative correlation learning on imbalanced data sets. In: International Joint Conference on Neural Networks, pp. 3259–3266 (2009)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press (1992)
Holmes, J.H.: Differential negative reinforcement improves classifier system learning rate in two-class problems with unequal base rates. In: Koza, J.R., Banzhaf, W., Chellapilla, K., et al. (eds.) Genetic Programming 1998: Proceedings of the Third Annual Conference, pp. 635–644 (1998)
Bhowan, U., Zhang, M., Johnston, M.: Evolving ensembles in multi-objective genetic programming for classification with unbalanced data. In: Proceedings of 2011 Genetic and Evolutionary Computation Conference, pp. 1331–1339. ACM (2011)
Zitzler, E., Laumanns, M., Thiele, L.: Spea2: Improving the strength pareto evolutionary algorithm for multiobjective optimization. Technical report (2001), TIK-Report 103, Department of Electrical Engineering, Swiss Federal Institute of Technology
Jin, Y., Sendhoff, B.: Pareto-based multiobjective machine learning: An overview and case studies. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 38, 397–415 (2008)
Chandra, A., Yao, X.: Ensemble learning using multi-objective evolutionary algorithms. Journal of Mathematical Modelling and Algorithms 5, 417–445 (2006)
Chen, H., Tino, P., Yao, X.: Predictive ensemble pruning by expectation propagation. IEEE Transactions on Knowledge and Data Engineering 21, 999–1013 (2009)
Asuncion, A., Newman, D.: UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11 (1) (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bhowan, U., Johnston, M., Zhang, M. (2011). Ensemble Learning and Pruning in Multi-Objective Genetic Programming for Classification with Unbalanced Data. In: Wang, D., Reynolds, M. (eds) AI 2011: Advances in Artificial Intelligence. AI 2011. Lecture Notes in Computer Science(), vol 7106. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25832-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-25832-9_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25831-2
Online ISBN: 978-3-642-25832-9
eBook Packages: Computer ScienceComputer Science (R0)