Using mutual information to test from Finite State Machines: Test suite generation☆,☆☆
Introduction
In order to increase the reliability of complex software systems, testing (Ammann and Offutt, 2017, Myers et al., 2011) is the most widely used methodology. In a few words, testing is the process of applying inputs to the system that we are evaluating (usually called the System Under Test (SUT)), observe the outputs produced by the system and assess whether the obtained outputs coincide with the expected ones. One of the main problems to effectively apply testing to complex systems is that the amount of different inputs is usually huge (in most cases, this set is infinite). Thus, it is of the upmost importance to devise approaches that generate a set of inputs, that is, a test suite, that is small enough to be practically feasible and is likely to be effective in finding faults.
Our goal with the work that we present in this paper is to improve the testing process by generating good test suites, that is, test suites with high fault finding capability. We address this problem in the specific case where we need a finite test suite that have a bound on the number of applied inputs. This scenario appears in a number of contexts. For example, we might want to generate a test suite that has a limited execution cost and, therefore, we have to limit the number of inputs applied to the SUT.
We assume that the SUT is a black-box: we only know its input and output alphabets. In particular, we have no additional information about its internal structure nor can we access to the source code of the system. However, we assume that we do have a specification of the system that we want to build so that we can check whether the application of a certain input to the SUT has produced the expected output, according to the specification. In order to simplify the presentation, and following our previous work (Ibias et al., 2021), we assume that the specification is given by a Finite State Machine (FSM) but it is possible to use other state-based formalisms, in particular, those containing data. Actually, FSMs are a powerful formalism that can be used to represent very different, software and hardware, systems. Finally, we also assume that the specification has the same number of states as the implementation.
It is important to note that the users of our approach do not need to provide a specification coded as an FSM. The users can provide their specification in any other formalism as long as it is state-based and can be mapped to an FSM that represents its semantics (possibly after some abstraction). One example of such state-based formalism are state-charts. This allows us to apply our FSM-based approaches to a wide range of state-based specifications. For example, classical FSM-based test generation techniques can be applied when testing from reactive I/O-state-transition systems (RIOSTS), a formalism that can be used with a range of embedded systems (Huang and Peleska, 2017). In particular, RIOSTS has been applied to evaluate part of the European Train Control System and an airbag controller (Hübner et al., 2019). Finally, the use of FSMs is well supported by several tools that can be used to specify and analyse them (e.g. fsmlib-cpp,1 automatalib (Isberner et al., 2015) and OpenFST (Allauzen et al., 2007)).
In this paper we consider a measure, called Biased Mutual Information (BMI), that is inspired by the classical concept of Mutual Information (Shannon, 1948). The intuition underlying the formal definition of BMI is that if we have two tests that have common parts, then they will tend to traverse the same branches of the SUT. Therefore, test suites with smaller BMI values will include tests that are more different, with the corresponding impact on the increase of diversity. Let us note that there is a well-studied correspondence between test diversity and test quality (Feldt et al., 2008, Cartaxo et al., 2011, Hemmati et al., 2013, Hemmati et al., 2015).
In our previous work (Ibias et al., 2021) we showed that BMI could be successfully used to choose, between two test suites, the one with a higher expectation to detect faults. In this paper we use BMI to confront a more difficult problem: guide the generation of test suites with high fault finding capability. The difference lies in the fact that, when selecting between two test suites, it is only necessary to compare their performance. However, in order to generate a test suite, it is necessary to consider not only the performance of the current test suite, but also its potential as a good base for the generation of new test suites.
In this paper we consider the intelligent generation of test suites by using BMI as a fundamental component to choose between different tests. Specifically, within the family of Genetic Algorithms, we consider a special kind of algorithm, called Genetic Programming Algorithms, that is able to deal with complex structures. In short, we propose a Grammar-Guided Genetic Programming Algorithm to generate test suites with high fault finding capability, using BMI as guiding measure, more specifically, being used to define the fitness function.
Our experiments focus on comparing our BMI-based approach with current state-of-the-art measures, using the Genetic Algorithm as a common framework to test the effectiveness of each measure. With this setting, we obtained that BMI is consistently better to generate test suites with higher fault finding capability than some measures (at the cost of a small increase in computation time), and that BMI is preferable in a realistic scenario than the remaining measures (due to limitations in computation cost or execution cost).
The rest of the paper is structured as follows. In Section 2 we review previous work related to our research. In Section 3 we present the background theory used in the paper. We also define BMI in that section. In Section 4 we propose our methodology to generate good test suites. In Section 5 we report our experiments to evaluate the performance of our methodology. In Section 6 we discuss the threats to the validity of our results. Finally, in Section 7 we provide conclusions and propose lines for future work.
Section snippets
Related work
Test suite generation is a fundamental problem in Software Testing and it has been addressed from multiple angles. Although the tester can manually build the tests included in the test suite, this is a time-consuming and prone to errors process. Therefore, it is essential to automate the generation of tests by following certain quality criteria (Anand et al., 2013). In the case of testing from FSMs, test generation is also one of the most challenging problems and it was already considered in
Preliminaries
To develop our work, we needed some preliminary concepts. In this section we present the main topics we used during our work. Specifically, we introduce the concepts of Finite State Machine, to define our experimental subjects; of Test and Test Suite, to define the goal of our problem; of Genetic Programming Algorithm, to define the algorithm with which we will solve our problem; of Biased Mutual Information, to define our proposed measure; and of Test Set Diameter, to define the
A genetic algorithm to generate test suites
In order to generate test suites using our measure, we developed a Genetic Programming Algorithm. The idea is that this genetic algorithm will use BMI as a guide to generate test suites. Additionally, we will use this algorithm to compare different measures. For that goal we will generate two test suites using it but using as fitness functions different measures.
This algorithm is based on the one presented in Ibias et al. (2019b), with several modifications to improve its performance.
Empirical evaluation
To evaluate the proposed measure (BMI) we carry out several experiments. In this section we introduce the research questions and experimental design to evaluate the ability of BMI to generate test suites with high fault finding capability. We also present the results of the experiments and how they answer the research questions.
We made available, for the interested reader, all the code, benchmarks and results at https://github.com/Colosu/BMI-Test-Generation
Threats to validity
To ensure the validity of the results of our work, we need to address the potential threats that can invalidate them. We start with the threats to internal validity, which refers to uncontrolled factors that can affect the output of the experiments, either in favour or against our hypothesis. The main threat in this category is the possibility of having faults in the code of the experiments. To diminish this threat we carefully tested the code, even using small examples for which we know what
Conclusions
In this work we have confronted a fundamental task for software testing when resources and time are scarce: the automatic generation of test suites with high fault finding capability problem. We addressed this problem developing a Grammar-Guided Genetic Programming Algorithm that guides the test suite generation using BMI.
Having developed BMI, and analysed a number of its properties, we reported on experiments that evaluated it. First, we compared test suites generated using BMI to randomly
CRediT authorship contribution statement
Alfredo Ibias: Conceptualization, Methodology, Software, Validation, Investigation, Data curation, Writing – original draft, Writing – reviewing and editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
I would like to thank Robert M. Hierons and Manuel Núñez for very useful discussions on the topic of this paper.
Alfredo Ibias received B.A. degrees in Computer Science and in Mathematics from Complutense University of Madrid, Spain, and an M.A. degree in Formal Methods in Computer Science from the same university. He is currently working on a Ph.D. degree in Computer Science at the same university.
References (63)
- et al.
An orchestrated survey of methodologies for automated software test case generation
J. Syst. Softw.
(2013) - et al.
An empirical evaluation of evolutionary algorithms for unit test suite generation
Inf. Softw. Technol.
(2018) The test suite generation problem: Optimal instances and their implications
Discrete Appl. Math.
(2007)- et al.
Squeeziness: An information theoretic measure for avoiding fault masking
Inform. Process. Lett.
(2012) - et al.
Using squeeziness to test component-based systems defined as finite state machines
Inf. Softw. Technol.
(2019) - et al.
SqSelect: Automatic assessment of failed error propagation in state-based systems
Expert Syst. Appl.
(2021) - et al.
Using mutual information to test from Finite State Machines: Test suite selection
Inf. Softw. Technol.
(2021) Bounded sequence testing from deterministic finite state machines
Theoret. Comput. Sci.
(2010)- et al.
Using entropy measures for comparison of software traces
Inform. Sci.
(2012) - et al.
Mutation testing advances: An analysis and survey
An optimization technique for protocol conformance test generation based on UIO sequences and Rural Chinese Postman Tours
IEEE Trans. Commun.
OpenFst: A general and efficient weighted finite-state transducer library
Introduction to Software Testing
Supporting the extraction of timed properties for passive testing by using probabilistic user models
An analysis of the relationship between conditional entropy and failed error propagation in software testing
Using genetic algorithms to generate test suites for FSMs
An evolutionary algorithm for selection of test cases
The measurement of software design quality
Ann. Softw. Eng.
On the use of a similarity function for test case selection in the context of model-based testing
Softw. Test. Verif. Reliab.
Testing software design modeled by finite state machines
IEEE Trans. Softw. Eng.
Clustering by compression
IEEE Trans. Inform. Theory
Information transformation: An underpinning theory for software engineering
Crossover and mutation operators for grammar-guided genetic programming
Soft Comput.
Elements of Information Theory
Aiding test case generation in temporally constrained state based systems using genetic algorithms
A case study on the use of genetic algorithms to generate test cases for temporal systems
An improved conformance testing method
Test set diameter: Quantifying the diversity of sets of test cases
Searching for cognitively diverse tests: Towards universal test diversity metrics
Generating tree inputs for testing using evolutionary computation techniques
Achieving scalable model-based testing through test case diversity
ACM Trans. Softw. Eng. Methodol.
Cited by (0)
Alfredo Ibias received B.A. degrees in Computer Science and in Mathematics from Complutense University of Madrid, Spain, and an M.A. degree in Formal Methods in Computer Science from the same university. He is currently working on a Ph.D. degree in Computer Science at the same university.
- ☆
This work has been supported by the Spanish MINECO/FEDER (grant FAME, RTI2018-093608-B-C31); the Region of Madrid, Spain (grant FORTE-CM, S2018/TCS-4314) co-funded by EIE Funds of the European Union; and Santander – Complutense University of Madrid, Spain (grant number CT63/19-CT64/19).
- ☆☆
Editor: Matthias Galster.