Skip to main content

Multi-gene Genetic Programming Based Defect-Ranking Software Modules

  • Conference paper
  • First Online:
  • 239 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 861))

Abstract

Most software defect prediction models aim at predicting the number of defects in a given software. However, it is very difficult to predict the precise number of defects in a module because of the presence of noise data. Another type of frequently used approach is ranking the software modules according to the relative number of defects, according to which software defect prediction can guide the testers to allocate the limited resources preferentially to modules with a greater number of defects. Owing to the redundant metrics in software defect data-sets, researchers always need to reduce the dimensions of the metrics before constructing defect prediction models. However a reduction in the number of dimensions may lead to some useful information being deleted too early, and consequently, the performance of the prediction model will decrease. In this paper, we propose an approach using multi-gene genetic programming (MGGP) to build a defect rank model. We compared the MGGP-based model with other optimized methods over 11 publicly available defect data-sets consisting of several software systems. The fault-percentile-average (FPA) is used to evaluate the performance of the MGGP and other methods. The results show that the models for different test objects that are built based on the MGGP approach perform better those based on other nonlinear prediction approaches when constructing the defect rank. In addition, the correlation between the software metrics will not affect the prediction performance. This means that, by using the MGGP method, we can use the original features to construct a prediction model without considering the influence of the correlation between the software module features.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://bug.inf.usi.ch/.

  2. 2.

    http://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/.

  3. 3.

    https://en.wikipedia.org/wiki/MannWhitney_U_test.

References

  1. Afzal, W., Torkar, R., Feldt, R.: Prediction of fault count data using genetic programming. In: Multitopic Conference, INMIC 2008. IEEE International, pp. 349–356 (2009)

    Google Scholar 

  2. Awad, M., Khanna, R.: Support vector regression. Neural Inf. Process. Lett. Rev. 11(10), 203–224 (2007)

    Google Scholar 

  3. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  5. Burke, E., Kendall, G.: Search methodologies: introductory tutorials in optimization and decision support techniques. Sci. Bus. 58(3), 409–410 (2005)

    MATH  Google Scholar 

  6. D’Ambros, M., Lanza, M., Robbes, R.: Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir. Softw. Eng. 17(4–5), 531–577 (2012)

    Article  Google Scholar 

  7. Elish, K.O., Elish, M.O.: Predicting defect-prone software modules using support vector machines. J. Syst. Softw. 81(5), 649–660 (2008)

    Article  Google Scholar 

  8. Elith, J., Leathwick, J.R., Hastie, T.: A working guide to boosted regression trees. J. Anim. Ecol. 77(4), 802–813 (2008)

    Article  Google Scholar 

  9. Gao, K., Khoshgoftaar, T.M.: A comprehensive empirical study of count models for software fault prediction. IEEE Trans. Reliab. 56(2), 223–236 (2007)

    Article  Google Scholar 

  10. Garg, A.: Review of genetic programming in modeling of machining processes. In: Proceedings of International Conference on Modelling, Identification & Control, pp. 653–658 (2012)

    Google Scholar 

  11. Garg, A., Tai, K.: Comparison of regression analysis, artificial neural network and genetic programming in handling the multicollinearity problem. In: Proceedings of International Conference on Modelling, Identification & Control, pp. 353–358 (2012)

    Google Scholar 

  12. Haynes, W.: Wilcoxon rank sum test. In: Dubitzky, W., Wolkenhauer, O., Cho, K.H., Yokota, H. (eds.) Encyclopedia of Systems Biology, pp. 2354–2355. Springer, New York (2013). https://doi.org/10.1007/978-1-4419-9863-7

    Chapter  Google Scholar 

  13. Hinchliffe, M., Hiden, H., Mckay, B., Willis, M., Tham, M., Barton, G.: Modelling chemical process systems using a multi-gene genetic programming algorithm. In: Genetic Programming (1996)

    Google Scholar 

  14. Jiang, Y., Cukic, B., Ma, Y.: Techniques for evaluating fault prediction models. Empir. Softw. Eng. 13(5), 561–595 (2008)

    Article  Google Scholar 

  15. Khoshgoftaar, T.M., Allen, E.B.: Ordering fault-prone software modules. Softw. Qual. J. 11(1), 19–37 (2003)

    Article  Google Scholar 

  16. Khoshgoftaar, T.M., Geleyn, E., Gao, K.: An empirical study of the impact of count models predictions on module-order models. In: Eighth IEEE Symposium on Software Metrics. Proceedings, pp. 161–172 (2002)

    Google Scholar 

  17. Khoshgoftaar, T.M., Seliya, N.: Fault prediction modeling for software quality estimation: comparing commonly used techniques. Empir. Softw. Eng. 8(3), 255–283 (2003)

    Article  Google Scholar 

  18. Koza, J.R.: Survey of genetic algorithms and genetic programming. In: Wescon/1995. Conference Record. Microelectronics Communications Technology Producing Quality Products Mobile and Portable Power Emerging Technologies, p. 589 (1995)

    Google Scholar 

  19. Malhotra, R.: A Systematic Review of Machine Learning Techniques for Software Fault Prediction. Elsevier Science Publishers B.V, Amsterdam (2015)

    Book  Google Scholar 

  20. Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2006)

    Article  Google Scholar 

  21. Nagappan, N., Ball, T.: Use of relative code churn measures to predict system defect density. In: International Conference on Software Engineering, pp. 284–292 (2005)

    Google Scholar 

  22. Rathore, S.S., Kumar, S.: An empirical study of some software fault prediction techniques for the number of faults prediction. Soft Comput. 21(24), 1–18 (2017)

    Article  Google Scholar 

  23. Rathore, S.S., Kumar, S.: Predicting number of faults in software system using genetic programming. In: International Conference on Soft Computing and Software Engineering, pp. 303–311 (2015)

    Article  Google Scholar 

  24. Tassey, G.: The economic impacts of inadequate infrastructure for software testing. Natl. Inst. Stand. Technol. 15(3), 125 (2002)

    Google Scholar 

  25. Wang, H., Khoshgoftaar, T.M., Seliya, N.: How many software metrics should be selected for defect prediction? In: Twenty-Fourth International Florida Artificial Intelligence Research Society Conference, Palm Beach, Florida, USA, 18–20 May 2011 (2005)

    Google Scholar 

  26. Weyuker, E.J., Ostrand, T.J., Bell, R.M.: Comparing the effectiveness of several modeling methods for fault prediction. Empir. Softw. Eng. 15(3), 277–295 (2013)

    Article  Google Scholar 

  27. Yang, X., Tang, K., Yao, X.: A learning-to-rank approach to software defect prediction. IEEE Trans. Reliab. 64(1), 234–246 (2015)

    Article  Google Scholar 

  28. Zhang, F., Hassan, A.E., Mcintosh, S., Zou, Y.: The use of summation to aggregate software metrics hinders the performance of defect prediction models. IEEE Trans. Softw. Eng. 43(5), 476–491 (2017)

    Article  Google Scholar 

  29. Zhang, F., Mockus, A., Keivanloo, I., Zou, Y.: Towards building a universal defect prediction model, pp. 182–191 (2014)

    Google Scholar 

  30. Zimmermann, T., Nagappan, N.: Predicting defects using network analysis on dependency graphs. In: ACM/IEEE International Conference on Software Engineering, pp. 531–540 (2008)

    Google Scholar 

  31. Zimmermann, T., Nagappan, N., Gall, H., Giger, E., Murphy, B.: Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, Amsterdam, the Netherlands, August, pp. 91–100 (2009)

    Google Scholar 

  32. Zimmermann, T., Premraj, R., Zeller, A.: Predicting defects for eclipse. In: International Workshop on Predictor MODELS in Software Engineering, Promise 2007: ICSE Workshops, p. 9 (2007)

    Google Scholar 

Download references

Acknowledgment

The work describes in this paper is supported by the National Natural Science Foundation of China under Grant No. 61702029, 61872026 and 61672085.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Shang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guo, J., Duan, Y., Shang, Y. (2019). Multi-gene Genetic Programming Based Defect-Ranking Software Modules. In: Li, Z., Jiang, H., Li, G., Zhou, M., Li, M. (eds) Software Engineering and Methodology for Emerging Domains. NASAC NASAC 2017 2018. Communications in Computer and Information Science, vol 861. Springer, Singapore. https://doi.org/10.1007/978-981-15-0310-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-0310-8_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-0309-2

  • Online ISBN: 978-981-15-0310-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics