Abstract
Most software defect prediction models aim at predicting the number of defects in a given software. However, it is very difficult to predict the precise number of defects in a module because of the presence of noise data. Another type of frequently used approach is ranking the software modules according to the relative number of defects, according to which software defect prediction can guide the testers to allocate the limited resources preferentially to modules with a greater number of defects. Owing to the redundant metrics in software defect data-sets, researchers always need to reduce the dimensions of the metrics before constructing defect prediction models. However a reduction in the number of dimensions may lead to some useful information being deleted too early, and consequently, the performance of the prediction model will decrease. In this paper, we propose an approach using multi-gene genetic programming (MGGP) to build a defect rank model. We compared the MGGP-based model with other optimized methods over 11 publicly available defect data-sets consisting of several software systems. The fault-percentile-average (FPA) is used to evaluate the performance of the MGGP and other methods. The results show that the models for different test objects that are built based on the MGGP approach perform better those based on other nonlinear prediction approaches when constructing the defect rank. In addition, the correlation between the software metrics will not affect the prediction performance. This means that, by using the MGGP method, we can use the original features to construct a prediction model without considering the influence of the correlation between the software module features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Afzal, W., Torkar, R., Feldt, R.: Prediction of fault count data using genetic programming. In: Multitopic Conference, INMIC 2008. IEEE International, pp. 349–356 (2009)
Awad, M., Khanna, R.: Support vector regression. Neural Inf. Process. Lett. Rev. 11(10), 203–224 (2007)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Burke, E., Kendall, G.: Search methodologies: introductory tutorials in optimization and decision support techniques. Sci. Bus. 58(3), 409–410 (2005)
D’Ambros, M., Lanza, M., Robbes, R.: Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir. Softw. Eng. 17(4–5), 531–577 (2012)
Elish, K.O., Elish, M.O.: Predicting defect-prone software modules using support vector machines. J. Syst. Softw. 81(5), 649–660 (2008)
Elith, J., Leathwick, J.R., Hastie, T.: A working guide to boosted regression trees. J. Anim. Ecol. 77(4), 802–813 (2008)
Gao, K., Khoshgoftaar, T.M.: A comprehensive empirical study of count models for software fault prediction. IEEE Trans. Reliab. 56(2), 223–236 (2007)
Garg, A.: Review of genetic programming in modeling of machining processes. In: Proceedings of International Conference on Modelling, Identification & Control, pp. 653–658 (2012)
Garg, A., Tai, K.: Comparison of regression analysis, artificial neural network and genetic programming in handling the multicollinearity problem. In: Proceedings of International Conference on Modelling, Identification & Control, pp. 353–358 (2012)
Haynes, W.: Wilcoxon rank sum test. In: Dubitzky, W., Wolkenhauer, O., Cho, K.H., Yokota, H. (eds.) Encyclopedia of Systems Biology, pp. 2354–2355. Springer, New York (2013). https://doi.org/10.1007/978-1-4419-9863-7
Hinchliffe, M., Hiden, H., Mckay, B., Willis, M., Tham, M., Barton, G.: Modelling chemical process systems using a multi-gene genetic programming algorithm. In: Genetic Programming (1996)
Jiang, Y., Cukic, B., Ma, Y.: Techniques for evaluating fault prediction models. Empir. Softw. Eng. 13(5), 561–595 (2008)
Khoshgoftaar, T.M., Allen, E.B.: Ordering fault-prone software modules. Softw. Qual. J. 11(1), 19–37 (2003)
Khoshgoftaar, T.M., Geleyn, E., Gao, K.: An empirical study of the impact of count models predictions on module-order models. In: Eighth IEEE Symposium on Software Metrics. Proceedings, pp. 161–172 (2002)
Khoshgoftaar, T.M., Seliya, N.: Fault prediction modeling for software quality estimation: comparing commonly used techniques. Empir. Softw. Eng. 8(3), 255–283 (2003)
Koza, J.R.: Survey of genetic algorithms and genetic programming. In: Wescon/1995. Conference Record. Microelectronics Communications Technology Producing Quality Products Mobile and Portable Power Emerging Technologies, p. 589 (1995)
Malhotra, R.: A Systematic Review of Machine Learning Techniques for Software Fault Prediction. Elsevier Science Publishers B.V, Amsterdam (2015)
Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2006)
Nagappan, N., Ball, T.: Use of relative code churn measures to predict system defect density. In: International Conference on Software Engineering, pp. 284–292 (2005)
Rathore, S.S., Kumar, S.: An empirical study of some software fault prediction techniques for the number of faults prediction. Soft Comput. 21(24), 1–18 (2017)
Rathore, S.S., Kumar, S.: Predicting number of faults in software system using genetic programming. In: International Conference on Soft Computing and Software Engineering, pp. 303–311 (2015)
Tassey, G.: The economic impacts of inadequate infrastructure for software testing. Natl. Inst. Stand. Technol. 15(3), 125 (2002)
Wang, H., Khoshgoftaar, T.M., Seliya, N.: How many software metrics should be selected for defect prediction? In: Twenty-Fourth International Florida Artificial Intelligence Research Society Conference, Palm Beach, Florida, USA, 18–20 May 2011 (2005)
Weyuker, E.J., Ostrand, T.J., Bell, R.M.: Comparing the effectiveness of several modeling methods for fault prediction. Empir. Softw. Eng. 15(3), 277–295 (2013)
Yang, X., Tang, K., Yao, X.: A learning-to-rank approach to software defect prediction. IEEE Trans. Reliab. 64(1), 234–246 (2015)
Zhang, F., Hassan, A.E., Mcintosh, S., Zou, Y.: The use of summation to aggregate software metrics hinders the performance of defect prediction models. IEEE Trans. Softw. Eng. 43(5), 476–491 (2017)
Zhang, F., Mockus, A., Keivanloo, I., Zou, Y.: Towards building a universal defect prediction model, pp. 182–191 (2014)
Zimmermann, T., Nagappan, N.: Predicting defects using network analysis on dependency graphs. In: ACM/IEEE International Conference on Software Engineering, pp. 531–540 (2008)
Zimmermann, T., Nagappan, N., Gall, H., Giger, E., Murphy, B.: Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, Amsterdam, the Netherlands, August, pp. 91–100 (2009)
Zimmermann, T., Premraj, R., Zeller, A.: Predicting defects for eclipse. In: International Workshop on Predictor MODELS in Software Engineering, Promise 2007: ICSE Workshops, p. 9 (2007)
Acknowledgment
The work describes in this paper is supported by the National Natural Science Foundation of China under Grant No. 61702029, 61872026 and 61672085.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Guo, J., Duan, Y., Shang, Y. (2019). Multi-gene Genetic Programming Based Defect-Ranking Software Modules. In: Li, Z., Jiang, H., Li, G., Zhou, M., Li, M. (eds) Software Engineering and Methodology for Emerging Domains. NASAC NASAC 2017 2018. Communications in Computer and Information Science, vol 861. Springer, Singapore. https://doi.org/10.1007/978-981-15-0310-8_4
Download citation
DOI: https://doi.org/10.1007/978-981-15-0310-8_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0309-2
Online ISBN: 978-981-15-0310-8
eBook Packages: Computer ScienceComputer Science (R0)