Using mutual information to test from Finite State Machines: Test suite generation,☆☆

https://doi.org/10.1016/j.jss.2022.111391Get rights and content

Highlights

  • Software Testing requires good Test Suites.

  • Biased Mutual Information leads the generation of good Test Suites.

  • BMI generates better test suites than state-of-the-art measures.

  • BMI generates better test suites than traditional methods.

  • BMI is preferable in realistic scenarios than completeness-oriented methods.

Abstract

Mutual Information is an information theoretic measure designed to quantify the amount of similarity between two random variables ranging over two sets. In recent work we have use it as a base for a measure, called Biased Mutual Information, to guide the selection of a test suite among different possibilities. In this paper, we adapt this concept and show how it can be used to address the problem of generating a test suite with high fault finding capability, in a black-box scenario and following a maximise diversity approach. Additionally, we present a new Grammar-Guided Genetic Programming Algorithm that uses Biased Mutual Information to guide the generation of such test suites. Our experimental results clearly show the potential value of our measure when used to generate test suites. Moreover, they show that our measure is better in guiding test generation than current state-of-the-art measures, like Test Set Diameter (TSDm) measures. Additionally, we compared our proposal with classical completeness-oriented methods, like the H-Method and the Transition Tour method, and found that our proposal produces smaller test suites with high enough fault finding capability. Therefore, our methodology is preferable in an scenario where a compromise is necessary between fault detection and execution time.

Introduction

In order to increase the reliability of complex software systems, testing (Ammann and Offutt, 2017, Myers et al., 2011) is the most widely used methodology. In a few words, testing is the process of applying inputs to the system that we are evaluating (usually called the System Under Test (SUT)), observe the outputs produced by the system and assess whether the obtained outputs coincide with the expected ones. One of the main problems to effectively apply testing to complex systems is that the amount of different inputs is usually huge (in most cases, this set is infinite). Thus, it is of the upmost importance to devise approaches that generate a set of inputs, that is, a test suite, that is small enough to be practically feasible and is likely to be effective in finding faults.

Our goal with the work that we present in this paper is to improve the testing process by generating good test suites, that is, test suites with high fault finding capability. We address this problem in the specific case where we need a finite test suite that have a bound on the number of applied inputs. This scenario appears in a number of contexts. For example, we might want to generate a test suite that has a limited execution cost and, therefore, we have to limit the number of inputs applied to the SUT.

We assume that the SUT is a black-box: we only know its input and output alphabets. In particular, we have no additional information about its internal structure nor can we access to the source code of the system. However, we assume that we do have a specification of the system that we want to build so that we can check whether the application of a certain input to the SUT has produced the expected output, according to the specification. In order to simplify the presentation, and following our previous work (Ibias et al., 2021), we assume that the specification is given by a Finite State Machine (FSM) but it is possible to use other state-based formalisms, in particular, those containing data. Actually, FSMs are a powerful formalism that can be used to represent very different, software and hardware, systems. Finally, we also assume that the specification has the same number of states as the implementation.

It is important to note that the users of our approach do not need to provide a specification coded as an FSM. The users can provide their specification in any other formalism as long as it is state-based and can be mapped to an FSM that represents its semantics (possibly after some abstraction). One example of such state-based formalism are state-charts. This allows us to apply our FSM-based approaches to a wide range of state-based specifications. For example, classical FSM-based test generation techniques can be applied when testing from reactive I/O-state-transition systems (RIOSTS), a formalism that can be used with a range of embedded systems (Huang and Peleska, 2017). In particular, RIOSTS has been applied to evaluate part of the European Train Control System and an airbag controller (Hübner et al., 2019). Finally, the use of FSMs is well supported by several tools that can be used to specify and analyse them (e.g. fsmlib-cpp,1 automatalib (Isberner et al., 2015) and OpenFST (Allauzen et al., 2007)).

In this paper we consider a measure, called Biased Mutual Information (BMI), that is inspired by the classical concept of Mutual Information (Shannon, 1948). The intuition underlying the formal definition of BMI is that if we have two tests that have common parts, then they will tend to traverse the same branches of the SUT. Therefore, test suites with smaller BMI values will include tests that are more different, with the corresponding impact on the increase of diversity. Let us note that there is a well-studied correspondence between test diversity and test quality (Feldt et al., 2008, Cartaxo et al., 2011, Hemmati et al., 2013, Hemmati et al., 2015).

In our previous work (Ibias et al., 2021) we showed that BMI could be successfully used to choose, between two test suites, the one with a higher expectation to detect faults. In this paper we use BMI to confront a more difficult problem: guide the generation of test suites with high fault finding capability. The difference lies in the fact that, when selecting between two test suites, it is only necessary to compare their performance. However, in order to generate a test suite, it is necessary to consider not only the performance of the current test suite, but also its potential as a good base for the generation of new test suites.

In this paper we consider the intelligent generation of test suites by using BMI as a fundamental component to choose between different tests. Specifically, within the family of Genetic Algorithms, we consider a special kind of algorithm, called Genetic Programming Algorithms, that is able to deal with complex structures. In short, we propose a Grammar-Guided Genetic Programming Algorithm to generate test suites with high fault finding capability, using BMI as guiding measure, more specifically, being used to define the fitness function.

Our experiments focus on comparing our BMI-based approach with current state-of-the-art measures, using the Genetic Algorithm as a common framework to test the effectiveness of each measure. With this setting, we obtained that BMI is consistently better to generate test suites with higher fault finding capability than some measures (at the cost of a small increase in computation time), and that BMI is preferable in a realistic scenario than the remaining measures (due to limitations in computation cost or execution cost).

The rest of the paper is structured as follows. In Section 2 we review previous work related to our research. In Section 3 we present the background theory used in the paper. We also define BMI in that section. In Section 4 we propose our methodology to generate good test suites. In Section 5 we report our experiments to evaluate the performance of our methodology. In Section 6 we discuss the threats to the validity of our results. Finally, in Section 7 we provide conclusions and propose lines for future work.

Section snippets

Related work

Test suite generation is a fundamental problem in Software Testing and it has been addressed from multiple angles. Although the tester can manually build the tests included in the test suite, this is a time-consuming and prone to errors process. Therefore, it is essential to automate the generation of tests by following certain quality criteria (Anand et al., 2013). In the case of testing from FSMs, test generation is also one of the most challenging problems and it was already considered in

Preliminaries

To develop our work, we needed some preliminary concepts. In this section we present the main topics we used during our work. Specifically, we introduce the concepts of Finite State Machine, to define our experimental subjects; of Test and Test Suite, to define the goal of our problem; of Genetic Programming Algorithm, to define the algorithm with which we will solve our problem; of Biased Mutual Information, to define our proposed measure; and of Test Set Diameter, to define the

A genetic algorithm to generate test suites

In order to generate test suites using our measure, we developed a Genetic Programming Algorithm. The idea is that this genetic algorithm will use BMI as a guide to generate test suites. Additionally, we will use this algorithm to compare different measures. For that goal we will generate two test suites using it but using as fitness functions different measures.

This algorithm is based on the one presented in Ibias et al. (2019b), with several modifications to improve its performance.

Empirical evaluation

To evaluate the proposed measure (BMI) we carry out several experiments. In this section we introduce the research questions and experimental design to evaluate the ability of BMI to generate test suites with high fault finding capability. We also present the results of the experiments and how they answer the research questions.

We made available, for the interested reader, all the code, benchmarks and results at https://github.com/Colosu/BMI-Test-Generation

Threats to validity

To ensure the validity of the results of our work, we need to address the potential threats that can invalidate them. We start with the threats to internal validity, which refers to uncontrolled factors that can affect the output of the experiments, either in favour or against our hypothesis. The main threat in this category is the possibility of having faults in the code of the experiments. To diminish this threat we carefully tested the code, even using small examples for which we know what

Conclusions

In this work we have confronted a fundamental task for software testing when resources and time are scarce: the automatic generation of test suites with high fault finding capability problem. We addressed this problem developing a Grammar-Guided Genetic Programming Algorithm that guides the test suite generation using BMI.

Having developed BMI, and analysed a number of its properties, we reported on experiments that evaluated it. First, we compared test suites generated using BMI to randomly

CRediT authorship contribution statement

Alfredo Ibias: Conceptualization, Methodology, Software, Validation, Investigation, Data curation, Writing – original draft, Writing – reviewing and editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

I would like to thank Robert M. Hierons and Manuel Núñez for very useful discussions on the topic of this paper.

Alfredo Ibias received B.A. degrees in Computer Science and in Mathematics from Complutense University of Madrid, Spain, and an M.A. degree in Formal Methods in Computer Science from the same university. He is currently working on a Ph.D. degree in Computer Science at the same university.

References (63)

  • AhoA.V. et al.

    An optimization technique for protocol conformance test generation based on UIO sequences and Rural Chinese Postman Tours

    IEEE Trans. Commun.

    (1991)
  • AllauzenC. et al.

    OpenFst: A general and efficient weighted finite-state transducer library

  • AmmannP. et al.

    Introduction to Software Testing

    (2017)
  • AndrésC. et al.

    Supporting the extraction of timed properties for passive testing by using probabilistic user models

  • AndroutsopoulosK. et al.

    An analysis of the relationship between conditional entropy and failed error propagation in software testing

  • Benito-ParejoM. et al.

    Using genetic algorithms to generate test suites for FSMs

  • Benito-ParejoM. et al.

    An evolutionary algorithm for selection of test cases

  • BlundellJ.K. et al.

    The measurement of software design quality

    Ann. Softw. Eng.

    (1997)
  • CartaxoE.G. et al.

    On the use of a similarity function for test case selection in the context of model-based testing

    Softw. Test. Verif. Reliab.

    (2011)
  • ChowT.S.

    Testing software design modeled by finite state machines

    IEEE Trans. Softw. Eng.

    (1978)
  • CilibrasiR. et al.

    Clustering by compression

    IEEE Trans. Inform. Theory

    (2005)
  • ClarkD. et al.

    Information transformation: An underpinning theory for software engineering

  • CouchetJ. et al.

    Crossover and mutation operators for grammar-guided genetic programming

    Soft Comput.

    (2007)
  • CoverT.M. et al.

    Elements of Information Theory

    (1991)
  • DerderianK. et al.

    Aiding test case generation in temporally constrained state based systems using genetic algorithms

  • DerderianK. et al.

    A case study on the use of genetic algorithms to generate test cases for temporal systems

  • DorofeevaR. et al.

    An improved conformance testing method

  • FeldtR. et al.

    Test set diameter: Quantifying the diversity of sets of test cases

  • FeldtR. et al.

    Searching for cognitively diverse tests: Towards universal test diversity metrics

  • GriñánD. et al.

    Generating tree inputs for testing using evolutionary computation techniques

  • HemmatiH. et al.

    Achieving scalable model-based testing through test case diversity

    ACM Trans. Softw. Eng. Methodol.

    (2013)
  • Cited by (0)

    Alfredo Ibias received B.A. degrees in Computer Science and in Mathematics from Complutense University of Madrid, Spain, and an M.A. degree in Formal Methods in Computer Science from the same university. He is currently working on a Ph.D. degree in Computer Science at the same university.

    This work has been supported by the Spanish MINECO/FEDER (grant FAME, RTI2018-093608-B-C31); the Region of Madrid, Spain (grant FORTE-CM, S2018/TCS-4314) co-funded by EIE Funds of the European Union; and Santander – Complutense University of Madrid, Spain (grant number CT63/19-CT64/19).

    ☆☆

    Editor: Matthias Galster.

    View full text