skip to main content
10.1145/3372787.3390439acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

On the detection of community smells using genetic programming-based ensemble classifier chain

Published:25 September 2020Publication History

ABSTRACT

Community smells are symptoms of organizational and social issues within the software development community that often increase the project costs and impact software quality. Recent studies have identified a variety of community smells and defined them as suboptimal patterns connected to organizational-social structures in the software development community such as the lack of communication, coordination and collaboration. Recognizing the advantages of the early detection of potential community smells in a software project, we introduce a novel approach that learns from various community organizational and social practices to provide an automated support for detecting community smells. In particular, our approach learns from a set of interleaving organizational-social symptoms that characterize the existence of community smell instances in a software project. We build a multi-label learning model to detect 8 common types of community smells. We use the ensemble classifier chain (ECC) model that transforms multi-label problems into several single-label problems which are solved using genetic programming (GP) to find the optimal detection rules for each smell type. To evaluate the performance of our approach, we conducted an empirical study on a benchmark of 103 open source projects and 407 community smell instances. The statistical tests of our results show that our approach can detect the eight considered smell types with an average F-measure of 89% achieving a better performance compared to different state-of-the-art techniques. Furthermore, we found that the most influential factors that best characterize community smells include the social network density and closeness centrality as well as the standard deviation of the number of developers per time zone and per community.

References

  1. 2020. Replication Package. https://github.com/GP-ECC/community-smellsGoogle ScholarGoogle Scholar
  2. T. Mukhopadhyay A. Gopal and M. S. Krishnan. 2002. The role of software processes and communication in offshore software development. In Communications of the ACM April 2002. Association for Computing Machinery, New York, NY, United States, USA, 1106--1113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Peter John Angeline. 1994. Genetic programming and emergent intelligence. Advances in genetic programming 1 (1994), 75--98.Google ScholarGoogle Scholar
  4. Guilherme Avelino, Leonardo Passos, Andre Hora, and Marco Tulio Valente. 2016. A novel approach for estimating truck factors. In IEEE 24th International Conference on Program Comprehension (ICPC). 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  5. Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, Harald Gall, and Brendan Murphy. 2009. Does distributed development affect software quality? An empirical case study of Windows Vista. In Proceedings of the 31st international conference on software engineering. 518--528.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Christian Bird, Nachiappan Nagappan, Harald Gall, Brendan Murphy, and Premkumar Devanbu. 2009. Putting it all together: Using socio-technical networks to predict failures. In 20th International Symposium on Software Reliability Engineering. 109--119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Marcelo Cataldo, James D Herbsleb, and Kathleen M Carley. 2008. Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity. In Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement. 2--11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Marcelo Cataldo, Audris Mockus, Jeffrey A Roberts, and James D Herbsleb. 2009. Software dependencies, work dependencies, and their impact on failures. IEEE Transactions on Software Engineering 35, 6 (2009), 864--878.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Marcelo Cataldo and Sangeeth Nambiar. 2009. On the relationship between process maturity and geographic distribution: an empirical analysis of their impact on software quality. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. 101--110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Marcelo Cataldo and Sangeeth Nambiar. 2012. The impact of geographic distribution and the nature of technical coupling on the quality of global software development projects. Journal of software: Evolution and Process 24, 2 (2012), 153--168.Google ScholarGoogle ScholarCross RefCross Ref
  11. Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin 114, 3 (1993), 494.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Routledge.Google ScholarGoogle Scholar
  13. V. Cosentino, J. L. C. Izquierdo, and J. Cabot. 2015. Assessing the bus factor of Git repositories. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 499--503.Google ScholarGoogle Scholar
  14. Stefano Invernizzi Elisabetta Di Nitto Damian A. Tamburri, Simone Gatti. 2016. Re-Architecting Software Forges into Communities: An Experience Report. In JOURNAL OF SOFTWARE: EVOLUTION AND PROCESS. 1--26.Google ScholarGoogle Scholar
  15. André C. P. L. F. de Carvalho and Alex A. Freitas. 2009. A Tutorial on Multi-label Classification Techniques. 177--195.Google ScholarGoogle Scholar
  16. Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation 6, 2 (2002), 182--197.Google ScholarGoogle Scholar
  17. Yvonne Dittrich, Jacob Nørbjerg, Paolo Tell, and Lars Bendix. 2018. Researching cooperation and communication in continuous software engineering. In 2018 IEEE/ACM 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 87--90.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yang Feng and Zhenyu Chen. 2012. Multi-label software behavior learning. In 34th International Conference on Software Engineering (ICSE). 1305--1308.Google ScholarGoogle ScholarCross RefCross Ref
  19. Mívian Ferreira, Guilherme Avelino, Marco Tulio Valente, and Kecia AM Ferreira. 2016. A Comparative Study of Algorithms for Estimating Truck Factor. In Brazilian Symposium on Software Components, Architectures and Reuse (SBCARS). 91--100.Google ScholarGoogle ScholarCross RefCross Ref
  20. Fred W Glover and Gary A Kochenberger. 2006. Handbook of metaheuristics. Vol. 57. Springer Science & Business Media.Google ScholarGoogle Scholar
  21. Mark Harman. 2007. The current state and future of search based software engineering. (2007), 342--357.Google ScholarGoogle Scholar
  22. Mark Harman and John Clark. 2004. Metrics are fitness functions too. In 10th International Symposium on Software Metrics. 58--69.Google ScholarGoogle ScholarCross RefCross Ref
  23. Mark Harman and Bryan F Jones. 2001. Search-based software engineering. Information and software Technology 43, 14 (2001), 833--839.Google ScholarGoogle Scholar
  24. Mark Harman, S Afshin Mansouri, and Yuanyuan Zhang. 2012. Search-based software engineering: Trends, techniques and applications. ACM Computing Surveys (CSUR) 45, 1 (2012), 11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. James D. Herbsleb and Audris Mockus. 2003. An empirical study of speed and communication in globally distributed software development. IEEE Transactions on software engineering 29, 6 (2003), 481--494.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Katherine J Hunt, Natalie Shlomo, and Julia Addington-Hall. 2013. Participant recruitment in sensitive surveys: a comparative trial of 'opt in'versus 'opt out'approaches. BMC Medical Research Methodology 13, 1 (2013), 3.Google ScholarGoogle ScholarCross RefCross Ref
  27. M. Joblin, W. Mauerer, S. Apel, J. Siegmund, and D. Riehle. 2015. From Developer Networks to Verified Communities: A Fine-Grained Approach. In 37th IEEE International Conference on Software Engineering (ICSE), Vol. 1. 563--573.Google ScholarGoogle Scholar
  28. M. John R. Koza. 1992. Genetic Programming: On Programming Computers by means of Natural Selection and Genetics. In MIT Press, Cambridge, MA, 1992. Association for Computing Machinery, New York, NY, United States.Google ScholarGoogle Scholar
  29. M. Kessentini and A. Ouni. 2017. Detecting Android Smells Using Multi-Objective Genetic Programming. In IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft). 122--132.Google ScholarGoogle Scholar
  30. Rensis Likert. 1932. A technique for the measurement of attitudes. Archives of psychology (1932).Google ScholarGoogle Scholar
  31. Stuart McIlroy, Nasir Ali, Hammad Khalid, and Ahmed E Hassan. 2016. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering 21, 3 (2016), 1067--1106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Andrew Meneely and Laurie A. Williams. 2011. Socio-technical developer networks: should we trust our measurements? 2011 33rd International Conference on Software Engineering (ICSE) (2011), 281--290.Google ScholarGoogle Scholar
  33. Nachiappan Nagappan, Brendan Murphy, and Victor Basili. 2008. The influence of organizational structure on software quality. In 2008 ACM/IEEE 30th International Conference on Software Engineering. 521--530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Nachiappan Nagappan, Brendan Murphy, and Victor Basili. 2008. The Influence of Organizational Structure on Software Quality: An Empirical Case Study. In Proceedings of the 30th International Conference on Software Engineering. 521--530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Martin Nordio, H Christian Estler, Bertrand Meyer, Julian Tschannen, Carlo Ghezzi, and Elisabetta Di Nitto. 2011. How do distribution and time zones affect software development? a case study on communication. In 2011 IEEE Sixth International Conference on Global Software Engineering. IEEE, 176--184.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ali Ouni, Raula Gaikovina Kula, Marouane Kessentini, and Katsuro Inoue. 2015. Web service antipatterns detection using genetic programming. In Annual Conference on Genetic and Evolutionary Computation (GECCO). 1351--1358.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Ouni, M. Kessentini, K. Inoue, and M. Ó. Cinnéide. 2017. Search-Based Web Service Antipatterns Detection. IEEE Transactions on Services Computing 10, 4 (July 2017), 603--617.Google ScholarGoogle ScholarCross RefCross Ref
  38. Ali Ouni, Marouane Kessentini, and Houari Sahraoui. 2013. Search-based refactoring using recorded code changes. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 221--230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mounir Boukadoum. 2013. Maintainability defects detection and correction: a multi-objective approach. Automated Software Engineering 20, 1 (2013), 47--79.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mohamed Salah Hamdi. 2012. Search-based refactoring: Towards semantics preservation. In 28th IEEE International Conference on Software Maintenance (ICSM). IEEE, 347--356.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mohamed Salah Hamdi. 2013. The use of development history in software refactoring using a multi-objective evolutionary algorithm. In Proceedings of the 15th annual conference on Genetic and evolutionary computation. 1461--1468.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ali Ouni, Marouane Kessentini, Houari Sahraoui, Katsuro Inoue, and Kalyanmoy Deb. 2016. Multi-criteria code refactoring using search-based software engineering: An industrial case study. ACM Transactions on Software Engineering and Methodology (TOSEM) 25, 3 (2016), 1--53.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ali Ouni, Raula Gaikovina Kula, and Katsuro Inoue. 2016. Search-based peer reviewers recommendation in modern code review. In IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 367--377.Google ScholarGoogle ScholarCross RefCross Ref
  44. Ali Ouni, Raula Gaikovina Kula, Marouane Kessentini, Takashi Ishio, Daniel M German, and Katsuro Inoue. 2017. Search-based software library recommendation using multi-objective optimization. Information and Software Technology 83 (2017), 55--75.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Fabio Palomba, Damian Andrew Andrew Tamburri, Francesca Arcelli Fontana, Rocco Oliveto, Andy Zaidman, and Alexander Serebrenik. 2018. Beyond technical aspects: How do community smells influence the intensity of code smells? IEEE transactions on software engineering (2018).Google ScholarGoogle Scholar
  46. Martin Pinzger, Nachiappan Nagappan, and Brendan Murphy. 2008. Can Developer-module Networks Predict Failures?. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Andy Podgurski, David Leon, Patrick Francis, Wes Masri, Melinda Minch, Jiayang Sun, and Bin Wang. 2003. Automated support for classifying software failure reports. In 25th International Conference on Software Engineering, 2003. Proceedings. IEEE, 465--475.Google ScholarGoogle ScholarCross RefCross Ref
  48. Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Machine learning 85, 3 (2011), 333.Google ScholarGoogle Scholar
  49. Motoshi Saeki. 1995. Communication, collaboration and cooperation in software development-how should we support group work in software development?. In Proceedings 1995 Asia Pacific Software Engineering Conference. IEEE, 12--20.Google ScholarGoogle ScholarCross RefCross Ref
  50. WIlliam Sugar. 2014. Studies of ID practices: A review and synthesis of research on ID current practices. Springer.Google ScholarGoogle Scholar
  51. Damian A Tamburri, Rick Kazman, and Hamed Fahimi. 2016. The architect's role in community shepherding. IEEE Software 33, 6 (2016), 70--79.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Damian A Tamburri, Philippe Kruchten, Patricia Lago, and Hans van Vliet. 2013. What is social debt in software engineering?. In International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 93--96.Google ScholarGoogle ScholarCross RefCross Ref
  53. D. A. Tamburri, P. Kruchten, P. Lago, and H. van Vliet. 2013. What is social debt in software engineering?. In International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). 93--96.Google ScholarGoogle ScholarCross RefCross Ref
  54. Damian A. Tamburri, Philippe Kruchten, Patricia Lago, and Hans van Vliet. 2015. Social debt in software engineering: insights from industry. Journal of Internet Services and Applications 6, 1 (04 May 2015), 10.Google ScholarGoogle ScholarCross RefCross Ref
  55. Damian A. Tamburri, Fabio Palomba, Alexander Serebrenik, and Andy Zaidman. 2018. Discovering community patterns in open-source: a systematic approach and its evaluation. Empirical Software Engineering (2018).Google ScholarGoogle Scholar
  56. D. A. A. Tamburri, F. Palomba, and R. Kazman. 2019. Exploring Community Smells in Open-Source: An Automated Approach. IEEE Transactions on Software Engineering (2019), 1--1.Google ScholarGoogle Scholar
  57. Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3, 3 (2007), 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  58. Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Mining Multilabel Data. 667--685.Google ScholarGoogle Scholar
  59. Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering 23, 7 (2010), 1079--1089.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Xin Xia, Yang Feng, David Lo, Zhenyu Chen, and Xinyu Wang. 2014. Towards more accurate multi-label software behavior learning. In Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). 134--143.Google ScholarGoogle ScholarCross RefCross Ref
  61. Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition 40, 7 (2007), 2038--2048.Google ScholarGoogle Scholar

Index Terms

  1. On the detection of community smells using genetic programming-based ensemble classifier chain

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICGSE '20: Proceedings of the 15th International Conference on Global Software Engineering
      June 2020
      147 pages
      ISBN:9781450370936
      DOI:10.1145/3372787

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 September 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Upcoming Conference

      ICSE 2025

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader