ABSTRACT
Community smells are symptoms of organizational and social issues within the software development community that often increase the project costs and impact software quality. Recent studies have identified a variety of community smells and defined them as suboptimal patterns connected to organizational-social structures in the software development community such as the lack of communication, coordination and collaboration. Recognizing the advantages of the early detection of potential community smells in a software project, we introduce a novel approach that learns from various community organizational and social practices to provide an automated support for detecting community smells. In particular, our approach learns from a set of interleaving organizational-social symptoms that characterize the existence of community smell instances in a software project. We build a multi-label learning model to detect 8 common types of community smells. We use the ensemble classifier chain (ECC) model that transforms multi-label problems into several single-label problems which are solved using genetic programming (GP) to find the optimal detection rules for each smell type. To evaluate the performance of our approach, we conducted an empirical study on a benchmark of 103 open source projects and 407 community smell instances. The statistical tests of our results show that our approach can detect the eight considered smell types with an average F-measure of 89% achieving a better performance compared to different state-of-the-art techniques. Furthermore, we found that the most influential factors that best characterize community smells include the social network density and closeness centrality as well as the standard deviation of the number of developers per time zone and per community.
- 2020. Replication Package. https://github.com/GP-ECC/community-smellsGoogle Scholar
- T. Mukhopadhyay A. Gopal and M. S. Krishnan. 2002. The role of software processes and communication in offshore software development. In Communications of the ACM April 2002. Association for Computing Machinery, New York, NY, United States, USA, 1106--1113. Google ScholarDigital Library
- Peter John Angeline. 1994. Genetic programming and emergent intelligence. Advances in genetic programming 1 (1994), 75--98.Google Scholar
- Guilherme Avelino, Leonardo Passos, Andre Hora, and Marco Tulio Valente. 2016. A novel approach for estimating truck factors. In IEEE 24th International Conference on Program Comprehension (ICPC). 1--10.Google ScholarCross Ref
- Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, Harald Gall, and Brendan Murphy. 2009. Does distributed development affect software quality? An empirical case study of Windows Vista. In Proceedings of the 31st international conference on software engineering. 518--528.Google ScholarDigital Library
- Christian Bird, Nachiappan Nagappan, Harald Gall, Brendan Murphy, and Premkumar Devanbu. 2009. Putting it all together: Using socio-technical networks to predict failures. In 20th International Symposium on Software Reliability Engineering. 109--119.Google ScholarDigital Library
- Marcelo Cataldo, James D Herbsleb, and Kathleen M Carley. 2008. Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity. In Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement. 2--11.Google ScholarDigital Library
- Marcelo Cataldo, Audris Mockus, Jeffrey A Roberts, and James D Herbsleb. 2009. Software dependencies, work dependencies, and their impact on failures. IEEE Transactions on Software Engineering 35, 6 (2009), 864--878.Google ScholarDigital Library
- Marcelo Cataldo and Sangeeth Nambiar. 2009. On the relationship between process maturity and geographic distribution: an empirical analysis of their impact on software quality. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. 101--110.Google ScholarDigital Library
- Marcelo Cataldo and Sangeeth Nambiar. 2012. The impact of geographic distribution and the nature of technical coupling on the quality of global software development projects. Journal of software: Evolution and Process 24, 2 (2012), 153--168.Google ScholarCross Ref
- Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin 114, 3 (1993), 494.Google ScholarCross Ref
- Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Routledge.Google Scholar
- V. Cosentino, J. L. C. Izquierdo, and J. Cabot. 2015. Assessing the bus factor of Git repositories. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 499--503.Google Scholar
- Stefano Invernizzi Elisabetta Di Nitto Damian A. Tamburri, Simone Gatti. 2016. Re-Architecting Software Forges into Communities: An Experience Report. In JOURNAL OF SOFTWARE: EVOLUTION AND PROCESS. 1--26.Google Scholar
- André C. P. L. F. de Carvalho and Alex A. Freitas. 2009. A Tutorial on Multi-label Classification Techniques. 177--195.Google Scholar
- Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation 6, 2 (2002), 182--197.Google Scholar
- Yvonne Dittrich, Jacob Nørbjerg, Paolo Tell, and Lars Bendix. 2018. Researching cooperation and communication in continuous software engineering. In 2018 IEEE/ACM 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 87--90.Google ScholarDigital Library
- Yang Feng and Zhenyu Chen. 2012. Multi-label software behavior learning. In 34th International Conference on Software Engineering (ICSE). 1305--1308.Google ScholarCross Ref
- Mívian Ferreira, Guilherme Avelino, Marco Tulio Valente, and Kecia AM Ferreira. 2016. A Comparative Study of Algorithms for Estimating Truck Factor. In Brazilian Symposium on Software Components, Architectures and Reuse (SBCARS). 91--100.Google ScholarCross Ref
- Fred W Glover and Gary A Kochenberger. 2006. Handbook of metaheuristics. Vol. 57. Springer Science & Business Media.Google Scholar
- Mark Harman. 2007. The current state and future of search based software engineering. (2007), 342--357.Google Scholar
- Mark Harman and John Clark. 2004. Metrics are fitness functions too. In 10th International Symposium on Software Metrics. 58--69.Google ScholarCross Ref
- Mark Harman and Bryan F Jones. 2001. Search-based software engineering. Information and software Technology 43, 14 (2001), 833--839.Google Scholar
- Mark Harman, S Afshin Mansouri, and Yuanyuan Zhang. 2012. Search-based software engineering: Trends, techniques and applications. ACM Computing Surveys (CSUR) 45, 1 (2012), 11.Google ScholarDigital Library
- James D. Herbsleb and Audris Mockus. 2003. An empirical study of speed and communication in globally distributed software development. IEEE Transactions on software engineering 29, 6 (2003), 481--494.Google ScholarDigital Library
- Katherine J Hunt, Natalie Shlomo, and Julia Addington-Hall. 2013. Participant recruitment in sensitive surveys: a comparative trial of 'opt in'versus 'opt out'approaches. BMC Medical Research Methodology 13, 1 (2013), 3.Google ScholarCross Ref
- M. Joblin, W. Mauerer, S. Apel, J. Siegmund, and D. Riehle. 2015. From Developer Networks to Verified Communities: A Fine-Grained Approach. In 37th IEEE International Conference on Software Engineering (ICSE), Vol. 1. 563--573.Google Scholar
- M. John R. Koza. 1992. Genetic Programming: On Programming Computers by means of Natural Selection and Genetics. In MIT Press, Cambridge, MA, 1992. Association for Computing Machinery, New York, NY, United States.Google Scholar
- M. Kessentini and A. Ouni. 2017. Detecting Android Smells Using Multi-Objective Genetic Programming. In IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft). 122--132.Google Scholar
- Rensis Likert. 1932. A technique for the measurement of attitudes. Archives of psychology (1932).Google Scholar
- Stuart McIlroy, Nasir Ali, Hammad Khalid, and Ahmed E Hassan. 2016. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering 21, 3 (2016), 1067--1106.Google ScholarDigital Library
- Andrew Meneely and Laurie A. Williams. 2011. Socio-technical developer networks: should we trust our measurements? 2011 33rd International Conference on Software Engineering (ICSE) (2011), 281--290.Google Scholar
- Nachiappan Nagappan, Brendan Murphy, and Victor Basili. 2008. The influence of organizational structure on software quality. In 2008 ACM/IEEE 30th International Conference on Software Engineering. 521--530.Google ScholarDigital Library
- Nachiappan Nagappan, Brendan Murphy, and Victor Basili. 2008. The Influence of Organizational Structure on Software Quality: An Empirical Case Study. In Proceedings of the 30th International Conference on Software Engineering. 521--530.Google ScholarDigital Library
- Martin Nordio, H Christian Estler, Bertrand Meyer, Julian Tschannen, Carlo Ghezzi, and Elisabetta Di Nitto. 2011. How do distribution and time zones affect software development? a case study on communication. In 2011 IEEE Sixth International Conference on Global Software Engineering. IEEE, 176--184.Google ScholarDigital Library
- Ali Ouni, Raula Gaikovina Kula, Marouane Kessentini, and Katsuro Inoue. 2015. Web service antipatterns detection using genetic programming. In Annual Conference on Genetic and Evolutionary Computation (GECCO). 1351--1358.Google ScholarDigital Library
- A. Ouni, M. Kessentini, K. Inoue, and M. Ó. Cinnéide. 2017. Search-Based Web Service Antipatterns Detection. IEEE Transactions on Services Computing 10, 4 (July 2017), 603--617.Google ScholarCross Ref
- Ali Ouni, Marouane Kessentini, and Houari Sahraoui. 2013. Search-based refactoring using recorded code changes. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 221--230.Google ScholarDigital Library
- Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mounir Boukadoum. 2013. Maintainability defects detection and correction: a multi-objective approach. Automated Software Engineering 20, 1 (2013), 47--79.Google ScholarDigital Library
- Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mohamed Salah Hamdi. 2012. Search-based refactoring: Towards semantics preservation. In 28th IEEE International Conference on Software Maintenance (ICSM). IEEE, 347--356.Google ScholarDigital Library
- Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mohamed Salah Hamdi. 2013. The use of development history in software refactoring using a multi-objective evolutionary algorithm. In Proceedings of the 15th annual conference on Genetic and evolutionary computation. 1461--1468.Google ScholarDigital Library
- Ali Ouni, Marouane Kessentini, Houari Sahraoui, Katsuro Inoue, and Kalyanmoy Deb. 2016. Multi-criteria code refactoring using search-based software engineering: An industrial case study. ACM Transactions on Software Engineering and Methodology (TOSEM) 25, 3 (2016), 1--53.Google ScholarDigital Library
- Ali Ouni, Raula Gaikovina Kula, and Katsuro Inoue. 2016. Search-based peer reviewers recommendation in modern code review. In IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 367--377.Google ScholarCross Ref
- Ali Ouni, Raula Gaikovina Kula, Marouane Kessentini, Takashi Ishio, Daniel M German, and Katsuro Inoue. 2017. Search-based software library recommendation using multi-objective optimization. Information and Software Technology 83 (2017), 55--75.Google ScholarDigital Library
- Fabio Palomba, Damian Andrew Andrew Tamburri, Francesca Arcelli Fontana, Rocco Oliveto, Andy Zaidman, and Alexander Serebrenik. 2018. Beyond technical aspects: How do community smells influence the intensity of code smells? IEEE transactions on software engineering (2018).Google Scholar
- Martin Pinzger, Nachiappan Nagappan, and Brendan Murphy. 2008. Can Developer-module Networks Predict Failures?. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2--12.Google ScholarDigital Library
- Andy Podgurski, David Leon, Patrick Francis, Wes Masri, Melinda Minch, Jiayang Sun, and Bin Wang. 2003. Automated support for classifying software failure reports. In 25th International Conference on Software Engineering, 2003. Proceedings. IEEE, 465--475.Google ScholarCross Ref
- Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Machine learning 85, 3 (2011), 333.Google Scholar
- Motoshi Saeki. 1995. Communication, collaboration and cooperation in software development-how should we support group work in software development?. In Proceedings 1995 Asia Pacific Software Engineering Conference. IEEE, 12--20.Google ScholarCross Ref
- WIlliam Sugar. 2014. Studies of ID practices: A review and synthesis of research on ID current practices. Springer.Google Scholar
- Damian A Tamburri, Rick Kazman, and Hamed Fahimi. 2016. The architect's role in community shepherding. IEEE Software 33, 6 (2016), 70--79.Google ScholarDigital Library
- Damian A Tamburri, Philippe Kruchten, Patricia Lago, and Hans van Vliet. 2013. What is social debt in software engineering?. In International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 93--96.Google ScholarCross Ref
- D. A. Tamburri, P. Kruchten, P. Lago, and H. van Vliet. 2013. What is social debt in software engineering?. In International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). 93--96.Google ScholarCross Ref
- Damian A. Tamburri, Philippe Kruchten, Patricia Lago, and Hans van Vliet. 2015. Social debt in software engineering: insights from industry. Journal of Internet Services and Applications 6, 1 (04 May 2015), 10.Google ScholarCross Ref
- Damian A. Tamburri, Fabio Palomba, Alexander Serebrenik, and Andy Zaidman. 2018. Discovering community patterns in open-source: a systematic approach and its evaluation. Empirical Software Engineering (2018).Google Scholar
- D. A. A. Tamburri, F. Palomba, and R. Kazman. 2019. Exploring Community Smells in Open-Source: An Automated Approach. IEEE Transactions on Software Engineering (2019), 1--1.Google Scholar
- Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3, 3 (2007), 1--13.Google ScholarCross Ref
- Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Mining Multilabel Data. 667--685.Google Scholar
- Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering 23, 7 (2010), 1079--1089.Google ScholarDigital Library
- Xin Xia, Yang Feng, David Lo, Zhenyu Chen, and Xinyu Wang. 2014. Towards more accurate multi-label software behavior learning. In Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). 134--143.Google ScholarCross Ref
- Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition 40, 7 (2007), 2038--2048.Google Scholar
Index Terms
- On the detection of community smells using genetic programming-based ensemble classifier chain
Recommendations
Refactoring community smells in the wild: the practitioner's field manual
ICSE-SEIS '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in SocietyCommunity smells have been defined as sub-optimal organizational structures that may lead to social debt. Previous studies have shown that they are highly diffused in both open- and closed-source projects, are perceived as harmful by practitioners, and ...
csDetector: an open source tool for community smells detection
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringCommunity smells represent symptoms of sub-optimal organizational and social issues within software development communities that often lead to additional project costs and reduced software quality. Previous research identified a variety of community ...
Splicing Community Patterns and Smells: A Preliminary Study
ICSEW'20: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering WorkshopsSoftware engineering projects are now more than ever a community effort. In the recent past, researchers have shown that their success may not only depend on source code quality, but also on other aspects like the balance of distance, culture, global ...
Comments