research-article

On the detection of community smells using genetic programming-based ensemble classifier chain

Authors:
Nuri Almarimi

University of Quebec, Montreal, QC, Canada

University of Quebec, Montreal, QC, Canada
View Profile

,
Ali Ouni

University of Quebec, Montreal, QC, Canada

University of Quebec, Montreal, QC, Canada
View Profile

,
Moataz Chouchen

University of Quebec, Montreal, QC, Canada

University of Quebec, Montreal, QC, Canada
View Profile

,
Islem Saidani

University of Quebec, Montreal, QC, Canada

University of Quebec, Montreal, QC, Canada
View Profile

,
Mohamed Wiem Mkaouer

Rochester Institute of Technology

Rochester Institute of Technology
View Profile

ICGSE '20: Proceedings of the 15th International Conference on Global Software EngineeringJune 2020Pages 43–54https://doi.org/10.1145/3372787.3390439

Published:25 September 2020Publication History

ICGSE '20: Proceedings of the 15th International Conference on Global Software Engineering

Pages 43–54

ABSTRACT

Community smells are symptoms of organizational and social issues within the software development community that often increase the project costs and impact software quality. Recent studies have identified a variety of community smells and defined them as suboptimal patterns connected to organizational-social structures in the software development community such as the lack of communication, coordination and collaboration. Recognizing the advantages of the early detection of potential community smells in a software project, we introduce a novel approach that learns from various community organizational and social practices to provide an automated support for detecting community smells. In particular, our approach learns from a set of interleaving organizational-social symptoms that characterize the existence of community smell instances in a software project. We build a multi-label learning model to detect 8 common types of community smells. We use the ensemble classifier chain (ECC) model that transforms multi-label problems into several single-label problems which are solved using genetic programming (GP) to find the optimal detection rules for each smell type. To evaluate the performance of our approach, we conducted an empirical study on a benchmark of 103 open source projects and 407 community smell instances. The statistical tests of our results show that our approach can detect the eight considered smell types with an average F-measure of 89% achieving a better performance compared to different state-of-the-art techniques. Furthermore, we found that the most influential factors that best characterize community smells include the social network density and closeness centrality as well as the standard deviation of the number of developers per time zone and per community.

References

2020. Replication Package. https://github.com/GP-ECC/community-smellsGoogle Scholar
T. Mukhopadhyay A. Gopal and M. S. Krishnan. 2002. The role of software processes and communication in offshore software development. In Communications of the ACM April 2002. Association for Computing Machinery, New York, NY, United States, USA, 1106--1113. Google ScholarDigital Library
Peter John Angeline. 1994. Genetic programming and emergent intelligence. Advances in genetic programming 1 (1994), 75--98.Google Scholar
Guilherme Avelino, Leonardo Passos, Andre Hora, and Marco Tulio Valente. 2016. A novel approach for estimating truck factors. In IEEE 24th International Conference on Program Comprehension (ICPC). 1--10.Google ScholarCross Ref
Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, Harald Gall, and Brendan Murphy. 2009. Does distributed development affect software quality? An empirical case study of Windows Vista. In Proceedings of the 31st international conference on software engineering. 518--528.Google ScholarDigital Library
Christian Bird, Nachiappan Nagappan, Harald Gall, Brendan Murphy, and Premkumar Devanbu. 2009. Putting it all together: Using socio-technical networks to predict failures. In 20th International Symposium on Software Reliability Engineering. 109--119.Google ScholarDigital Library
Marcelo Cataldo, James D Herbsleb, and Kathleen M Carley. 2008. Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity. In Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement. 2--11.Google ScholarDigital Library
Marcelo Cataldo, Audris Mockus, Jeffrey A Roberts, and James D Herbsleb. 2009. Software dependencies, work dependencies, and their impact on failures. IEEE Transactions on Software Engineering 35, 6 (2009), 864--878.Google ScholarDigital Library
Marcelo Cataldo and Sangeeth Nambiar. 2009. On the relationship between process maturity and geographic distribution: an empirical analysis of their impact on software quality. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. 101--110.Google ScholarDigital Library
Marcelo Cataldo and Sangeeth Nambiar. 2012. The impact of geographic distribution and the nature of technical coupling on the quality of global software development projects. Journal of software: Evolution and Process 24, 2 (2012), 153--168.Google ScholarCross Ref
Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin 114, 3 (1993), 494.Google ScholarCross Ref
Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Routledge.Google Scholar
V. Cosentino, J. L. C. Izquierdo, and J. Cabot. 2015. Assessing the bus factor of Git repositories. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 499--503.Google Scholar
Stefano Invernizzi Elisabetta Di Nitto Damian A. Tamburri, Simone Gatti. 2016. Re-Architecting Software Forges into Communities: An Experience Report. In JOURNAL OF SOFTWARE: EVOLUTION AND PROCESS. 1--26.Google Scholar
André C. P. L. F. de Carvalho and Alex A. Freitas. 2009. A Tutorial on Multi-label Classification Techniques. 177--195.Google Scholar
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation 6, 2 (2002), 182--197.Google Scholar
Yvonne Dittrich, Jacob Nørbjerg, Paolo Tell, and Lars Bendix. 2018. Researching cooperation and communication in continuous software engineering. In 2018 IEEE/ACM 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 87--90.Google ScholarDigital Library
Yang Feng and Zhenyu Chen. 2012. Multi-label software behavior learning. In 34th International Conference on Software Engineering (ICSE). 1305--1308.Google ScholarCross Ref
Mívian Ferreira, Guilherme Avelino, Marco Tulio Valente, and Kecia AM Ferreira. 2016. A Comparative Study of Algorithms for Estimating Truck Factor. In Brazilian Symposium on Software Components, Architectures and Reuse (SBCARS). 91--100.Google ScholarCross Ref
Fred W Glover and Gary A Kochenberger. 2006. Handbook of metaheuristics. Vol. 57. Springer Science & Business Media.Google Scholar
Mark Harman. 2007. The current state and future of search based software engineering. (2007), 342--357.Google Scholar
Mark Harman and John Clark. 2004. Metrics are fitness functions too. In 10th International Symposium on Software Metrics. 58--69.Google ScholarCross Ref
Mark Harman and Bryan F Jones. 2001. Search-based software engineering. Information and software Technology 43, 14 (2001), 833--839.Google Scholar
Mark Harman, S Afshin Mansouri, and Yuanyuan Zhang. 2012. Search-based software engineering: Trends, techniques and applications. ACM Computing Surveys (CSUR) 45, 1 (2012), 11.Google ScholarDigital Library
James D. Herbsleb and Audris Mockus. 2003. An empirical study of speed and communication in globally distributed software development. IEEE Transactions on software engineering 29, 6 (2003), 481--494.Google ScholarDigital Library
Katherine J Hunt, Natalie Shlomo, and Julia Addington-Hall. 2013. Participant recruitment in sensitive surveys: a comparative trial of 'opt in'versus 'opt out'approaches. BMC Medical Research Methodology 13, 1 (2013), 3.Google ScholarCross Ref
M. Joblin, W. Mauerer, S. Apel, J. Siegmund, and D. Riehle. 2015. From Developer Networks to Verified Communities: A Fine-Grained Approach. In 37th IEEE International Conference on Software Engineering (ICSE), Vol. 1. 563--573.Google Scholar
M. John R. Koza. 1992. Genetic Programming: On Programming Computers by means of Natural Selection and Genetics. In MIT Press, Cambridge, MA, 1992. Association for Computing Machinery, New York, NY, United States.Google Scholar
M. Kessentini and A. Ouni. 2017. Detecting Android Smells Using Multi-Objective Genetic Programming. In IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft). 122--132.Google Scholar
Rensis Likert. 1932. A technique for the measurement of attitudes. Archives of psychology (1932).Google Scholar
Stuart McIlroy, Nasir Ali, Hammad Khalid, and Ahmed E Hassan. 2016. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering 21, 3 (2016), 1067--1106.Google ScholarDigital Library
Andrew Meneely and Laurie A. Williams. 2011. Socio-technical developer networks: should we trust our measurements? 2011 33rd International Conference on Software Engineering (ICSE) (2011), 281--290.Google Scholar
Nachiappan Nagappan, Brendan Murphy, and Victor Basili. 2008. The influence of organizational structure on software quality. In 2008 ACM/IEEE 30th International Conference on Software Engineering. 521--530.Google ScholarDigital Library
Nachiappan Nagappan, Brendan Murphy, and Victor Basili. 2008. The Influence of Organizational Structure on Software Quality: An Empirical Case Study. In Proceedings of the 30th International Conference on Software Engineering. 521--530.Google ScholarDigital Library
Martin Nordio, H Christian Estler, Bertrand Meyer, Julian Tschannen, Carlo Ghezzi, and Elisabetta Di Nitto. 2011. How do distribution and time zones affect software development? a case study on communication. In 2011 IEEE Sixth International Conference on Global Software Engineering. IEEE, 176--184.Google ScholarDigital Library
Ali Ouni, Raula Gaikovina Kula, Marouane Kessentini, and Katsuro Inoue. 2015. Web service antipatterns detection using genetic programming. In Annual Conference on Genetic and Evolutionary Computation (GECCO). 1351--1358.Google ScholarDigital Library
A. Ouni, M. Kessentini, K. Inoue, and M. Ó. Cinnéide. 2017. Search-Based Web Service Antipatterns Detection. IEEE Transactions on Services Computing 10, 4 (July 2017), 603--617.Google ScholarCross Ref
Ali Ouni, Marouane Kessentini, and Houari Sahraoui. 2013. Search-based refactoring using recorded code changes. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 221--230.Google ScholarDigital Library
Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mounir Boukadoum. 2013. Maintainability defects detection and correction: a multi-objective approach. Automated Software Engineering 20, 1 (2013), 47--79.Google ScholarDigital Library
Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mohamed Salah Hamdi. 2012. Search-based refactoring: Towards semantics preservation. In 28th IEEE International Conference on Software Maintenance (ICSM). IEEE, 347--356.Google ScholarDigital Library
Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mohamed Salah Hamdi. 2013. The use of development history in software refactoring using a multi-objective evolutionary algorithm. In Proceedings of the 15th annual conference on Genetic and evolutionary computation. 1461--1468.Google ScholarDigital Library
Ali Ouni, Marouane Kessentini, Houari Sahraoui, Katsuro Inoue, and Kalyanmoy Deb. 2016. Multi-criteria code refactoring using search-based software engineering: An industrial case study. ACM Transactions on Software Engineering and Methodology (TOSEM) 25, 3 (2016), 1--53.Google ScholarDigital Library
Ali Ouni, Raula Gaikovina Kula, and Katsuro Inoue. 2016. Search-based peer reviewers recommendation in modern code review. In IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 367--377.Google ScholarCross Ref
Ali Ouni, Raula Gaikovina Kula, Marouane Kessentini, Takashi Ishio, Daniel M German, and Katsuro Inoue. 2017. Search-based software library recommendation using multi-objective optimization. Information and Software Technology 83 (2017), 55--75.Google ScholarDigital Library
Fabio Palomba, Damian Andrew Andrew Tamburri, Francesca Arcelli Fontana, Rocco Oliveto, Andy Zaidman, and Alexander Serebrenik. 2018. Beyond technical aspects: How do community smells influence the intensity of code smells? IEEE transactions on software engineering (2018).Google Scholar
Martin Pinzger, Nachiappan Nagappan, and Brendan Murphy. 2008. Can Developer-module Networks Predict Failures?. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2--12.Google ScholarDigital Library
Andy Podgurski, David Leon, Patrick Francis, Wes Masri, Melinda Minch, Jiayang Sun, and Bin Wang. 2003. Automated support for classifying software failure reports. In 25th International Conference on Software Engineering, 2003. Proceedings. IEEE, 465--475.Google ScholarCross Ref
Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Machine learning 85, 3 (2011), 333.Google Scholar
Motoshi Saeki. 1995. Communication, collaboration and cooperation in software development-how should we support group work in software development?. In Proceedings 1995 Asia Pacific Software Engineering Conference. IEEE, 12--20.Google ScholarCross Ref
WIlliam Sugar. 2014. Studies of ID practices: A review and synthesis of research on ID current practices. Springer.Google Scholar
Damian A Tamburri, Rick Kazman, and Hamed Fahimi. 2016. The architect's role in community shepherding. IEEE Software 33, 6 (2016), 70--79.Google ScholarDigital Library
Damian A Tamburri, Philippe Kruchten, Patricia Lago, and Hans van Vliet. 2013. What is social debt in software engineering?. In International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 93--96.Google ScholarCross Ref
D. A. Tamburri, P. Kruchten, P. Lago, and H. van Vliet. 2013. What is social debt in software engineering?. In International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). 93--96.Google ScholarCross Ref
Damian A. Tamburri, Philippe Kruchten, Patricia Lago, and Hans van Vliet. 2015. Social debt in software engineering: insights from industry. Journal of Internet Services and Applications 6, 1 (04 May 2015), 10.Google ScholarCross Ref
Damian A. Tamburri, Fabio Palomba, Alexander Serebrenik, and Andy Zaidman. 2018. Discovering community patterns in open-source: a systematic approach and its evaluation. Empirical Software Engineering (2018).Google Scholar
D. A. A. Tamburri, F. Palomba, and R. Kazman. 2019. Exploring Community Smells in Open-Source: An Automated Approach. IEEE Transactions on Software Engineering (2019), 1--1.Google Scholar
Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3, 3 (2007), 1--13.Google ScholarCross Ref
Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Mining Multilabel Data. 667--685.Google Scholar
Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering 23, 7 (2010), 1079--1089.Google ScholarDigital Library
Xin Xia, Yang Feng, David Lo, Zhenyu Chen, and Xinyu Wang. 2014. Towards more accurate multi-label software behavior learning. In Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). 134--143.Google ScholarCross Ref
Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition 40, 7 (2007), 2038--2048.Google Scholar

Index Terms

On the detection of community smells using genetic programming-based ensemble classifier chain
1. Software and its engineering
  1. Software organization and properties

Recommendations

Refactoring community smells in the wild: the practitioner's field manual
ICSE-SEIS '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Society

Community smells have been defined as sub-optimal organizational structures that may lead to social debt. Previous studies have shown that they are highly diffused in both open- and closed-source projects, are perceived as harmful by practitioners, and ...
Read More
csDetector: an open source tool for community smells detection
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Community smells represent symptoms of sub-optimal organizational and social issues within software development communities that often lead to additional project costs and reduced software quality. Previous research identified a variety of community ...
Read More
Splicing Community Patterns and Smells: A Preliminary Study
ICSEW'20: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops

Software engineering projects are now more than ever a community effort. In the recent past, researchers have shown that their success may not only depend on source code quality, but also on other aspects like the balance of distance, culture, global ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICGSE '20: Proceedings of the 15th International Conference on Global Software Engineering
June 2020
147 pages
ISBN:9781450370936
DOI:10.1145/3372787
General Chair:
Paolo Tell
IT University of Copenhagen, Denmark
,
Program Chairs:
Igor Steinmacher
Northern Arizona University
,
Ricardo Britto
Ericsson, Blekinge Institue of Technology, Sweden
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 September 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
community smells
genetic programming
multi-label learning
search-based software engineering
social debt
socio-technical factors
Qualifiers
- research-article
Conference

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 253
  Total Downloads
- Downloads (Last 12 months)54
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On the detection of community smells using genetic programming-based ensemble classifier chain

ICGSE '20: Proceedings of the 15th International Conference on Global Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Refactoring community smells in the wild: the practitioner's field manual

csDetector: an open source tool for community smells detection

Splicing Community Patterns and Smells: A Preliminary Study