Mining Task-Specific Lines of Code Counters
Created by W.Langdon from
gp-bibliography.bib Revision:1.8178
- @Article{Ochodek:2023:ACC,
-
author = "Miroslaw Ochodek and Krzysztof Durczak and
Jerzy Nawrocki and Miroslaw Staron",
-
journal = "IEEE Access",
-
title = "Mining Task-Specific Lines of Code Counters",
-
year = "2023",
-
volume = "11",
-
pages = "100218--100233",
-
abstract = "Context: Lines of code (LOC) is a fundamental software
code measure that is widely used as a proxy for
software development effort or as a normalization
factor in many other software-related measures (e.g.,
defect density). Unfortunately, the problem is that it
is not clear which lines of code should be counted: all
of them or some specific ones depending on the project
context and task in mind? Objective: To design a
generator of task-specific LOC measures and their
counters mined directly from data that optimise the
correlation between the LOC measures and variables they
proxy for (e.g., code-review duration). Method: We use
Design Science Research as our research methodology to
build and validate a generator of task-specific LOC
measures and their counters. The generated LOC counters
have a form of binary decision trees inferred from
historical data using Genetic Programming. The proposed
tool was validated based on three tasks, i.e., mining
LOC measures to proxy for code readability, number of
assertions in unit tests, and code-review duration.
Results: Task-specific LOC measures showed a 'strong'
to 'very strong' negative correlation with
code-readability score (Kendall's $\tau $ ranging from
-0.83 to -0.76) compared to 'weak' to 'strong' negative
correlation for the best among the standard LOC
measures ( $\tau $ ranging from -0.36 to -0.13). For
the problem of proxying for the number of assertions in
unit tests, correlation coefficients were also higher
for task-specific LOC measures by ca. 11percent to
21percent ( $\tau $ ranged from 0.31 to 0.34). Finally,
task-specific LOC measures showed a stronger
correlation with code-review duration than the best
among the standard LOC measures ( $\tau $ = 0.31, 0.36,
and 0.37 compared to 0.11, 0.08, 0.16, respectively).
Conclusions: Our study shows that it is possible to
mine task-specific LOC counters from historical
datasets using Genetic Programming. Task-specific LOC
measures obtained that way show stronger correlations
with the variables they proxy for than the standard LOC
measures.",
-
keywords = "genetic algorithms, genetic programming, Codes, LOC,
Task analysis, Software measurement, Software
engineering, Standards, Size measurement, Particle
measurements, Software measurement, software size,
lines of code, LOC",
-
DOI = "doi:10.1109/ACCESS.2023.3314572",
-
ISSN = "2169-3536",
-
notes = "Also known as \cite{10247541}",
- }
Genetic Programming entries for
Miroslaw Ochodek
Krzysztof Durczak
Jerzy Nawrocki
Miroslaw Staron
Citations