Skip to main content
Log in

Correctness attraction: a study of stability of software behavior under runtime perturbation

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Can the execution of software be perturbed without breaking the correctness of the output? In this paper, we devise a protocol to answer this question from a novel perspective. In an experimental study, we observe that many perturbations do not break the correctness in ten subject programs. We call this phenomenon “correctness attraction”. The uniqueness of this protocol is that it considers a systematic exploration of the perturbation space as well as perfect oracles to determine the correctness of the output. To this extent, our findings on the stability of software under execution perturbations have a level of validity that has never been reported before in the scarce related work. A qualitative manual analysis enables us to set up the first taxonomy ever of the reasons behind correctness attraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Listing 1
Listing 2
Listing 3
Figure 2
Figure 3

Similar content being viewed by others

Notes

  1. | is the bitwise or operator. >> is the binary right shift operator. The assignment | = is the bitwise or operator between the left operand and the right operand, then the result is affected to the left operand.

  2. In our experiments, we implement this transformation on Java programs using the Spoon transformation library (Pawlak et al. 2015).

  3. Yet, we note that the oracles for program laguerre and linreg can be considered as approximate computing, yet the error margin we accept is very low (10−6).

  4. http://rosettacode.org/.

  5. https://www.bouncycastle.org/.

  6. https://github.com/bcgit/bc-java.

  7. http://www.tomgibara.com/computer-vision/canny-edge-detector.

  8. https://frama.link/3ZxP5eBj.

  9. http://www.mirbase.org/ftp.shtml.

  10. Version 3.6.1: https://frama.link/tQCYrZ2W.

  11. Version 3.8.0: https://frama.link/fCjiqzk2.

References

  • Barr E, Harman M, McMinn P, Shahbaz M, Yoo S (2015) The oracle problem in software testing: a survey. IEEE Trans Softw Eng 41(5):507–525

    Article  Google Scholar 

  • Baudry B, Monperrus M (2015) The multiple facets of software diversity: recent developments in year 2000 and beyond. ACM Comput Surv 1–26

  • Dijkstra EW (1988) On the cruelty of really teaching computing science

  • Eggert PR, Parker DS (2005) Perturbing and evaluating numerical programs without recompilation—the Wonglediff way. Softw Pract Exper 35(4):313–322

    Article  Google Scholar 

  • Khoo W M (2013) Decompilation as search. University of Cambridge, PhD thesis

    Google Scholar 

  • Li X, Yeung D (2007) Application-level correctness and its impact on fault tolerance. In: 2007 IEEE 13th International symposium on high performance computer architecture, pp 181–192

  • Mittal S (2016) A survey of techniques for approximate computing. ACM Comput Surv 48(4):62,1–62,33

    MathSciNet  Google Scholar 

  • Morell L, Murrill B, Rand R (1997) Perturbation analysis of computer programs. In: Proceedings of the 12th annual conference on computer assurance, 1997. COMPASS ’97 Are we making progress towards computer assurance?, pp 77–87

  • Pawlak R, Monperrus M, Petitprez N, Noguera C, Seinturier L (2015) Spoon: a library for implementing analyses and transformations of java source code. Softw Pract Exper 46:1155–1179

    Article  Google Scholar 

  • Rinard M, Cadar C, Nguyen HH (2005) Exploring the acceptability envelope. In: Companion to the 20th Annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications, OOPSLA ’05. New York, p ACM

  • Roy P, Ray R, Wang C, Wong WF (2014) Asac: automatic sensitivity analysis for approximate computing. SIGPLAN Not 49(5):95–104

    Article  Google Scholar 

  • Sedgewick R (1978) Implementing quicksort programs. Commun ACM 21(10):847–857

    Article  MATH  Google Scholar 

  • Tallam S, Tian C, Gupta R, Zhang X (2008) Avoiding program failures through safe execution perturbations. In: Proceedings of the 2008 32Nd Annual IEEE international computer software and applications conference, COMPSAC ’08. IEEE Computer Society, Washington, DC, pp 152–159

  • Tang E, Barr E, Li X, Su Z (2010) Perturbing numerical calculations for statistical analysis of floating-point program (in)stability. In: Proceedings of the 19th International symposium on software testing and analysis, ISSTA ’10. ACM, New York, pp 131–142

  • Wang N, Fertig M, Patel S (2003) Y-branches: when you come to a fork in the road, take it. In: 12th International conference on parallel architectures and compilation techniques, pp 56–66

  • Welch TA (1984) A technique for high-performance data compression. Computer 17(6):8–19

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially supported by the EU Project STAMP ICT-16-10 No.731529, CPER Nord-Pas de Calais/FEDER DATA Advanced data science and technologies 2015-2020, and the French Ministry of Higher Education and Research. We also wishes to acknowledge the continual support of Inria, and PP acknowledges the stimulating environment provided by the SequeL Inria project-team.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Danglot.

Additional information

Communicated by: Atif Memon

Appendix: Experiment Subject

Appendix: Experiment Subject

1.1 A.1 Overview

Table 9 gives an overview of the considered benchmark. The 1st column is the name used to refer to the subject and the second column gives the number of Line of Code (LOC) of the program. Then in the 3rd, 4th and 5th respectively the number of integer perturbation point, the number of Boolean perturbation point and the total number of perturbation point for each subject. In the last column, it a brief description the computation considered.

Table 9 Dataset of 10 subjects programs used in our experiments

1.2 A.2 Quicksort

Quicksort is a sorting algorithm. We consider an implementation of Quicksort algorithm in Java. The original code is available at https://frama.link/XGMArl34. A live demo is available at https://danglotb.github.io/resources/correctness-attraction/live-demo.html

Correctness Oracle: :

The oracle checks that the array is correctly sorted, checks that each element of the input is also in the output, and checks that no element that is not present in the input is in the output.

Table 10 URL to oracle and input generator for each of the 10 subject

1.3 A.3 Zip

The Lempel-Ziv-Welch (LZW) (Welch 1984) is a loss-less data compression algorithm. We use it to compress/uncompress strings. The implementation comes from Rosetta Code,Footnote 4 with 1 class and 2 methods: one class to compress, and the other class to uncompress. The implementation has 6 Boolean perturbation points and 19 numerical perturbation points spread over 56 lines of code.

Correctness Oracle: :

The scenario is to uncompress the compressed input string. The perfect oracle asserts that the output string is the same as the input string.

1.4 A.4 Sudoku

We consider a Sudoku solver taken from Rosetta Code. We input a randomly generated grid. Some cells are already filled in with values. There is 1 class of 87 lines of codes, containing 89 numerical perturbation points and 26 Boolean perturbation points.

Correctness Oracle: :

The oracle asserts that all Sudoku constraints are satisfied: all cells are filled and valid, and all cells already in the input problem remain unchanged.

1.5 A.5 MD5

The Message Digest 5 (MD5) algorithm is used to hash a string of a given size. We take the implementation from Rosetta Code. There is 1 class with 1 method, and 91 lines of codes. We find 164 numerical perturbation points, and 11 Boolean perturbation points.

Correctness Oracle: :

The oracle is that the hash is the same as the one from the reference implementation.

1.6 A.6 RSA

An RSA cryptosystem was designed by Ron Rivest, Adi Shamir, and Leonard Adleman. This implementation is a real, production-ready one taken from bouncy-castle.Footnote 5 , Footnote 6 The project is composed of 1494 classes with a total of 241483 lines of code. We studied the RSACoreEngine class, which has 6 methods with 203 lines of codes, 73 numerical perturbation points and 19 Boolean perturbation points. Many integer points are BigInteger Java objects, that we perturb appropriately. The considered inputs are random strings of 64 bytes. Correctness Oracle: The considered scenario is decrypt(crypt(x)): The oracle asserts that the decrypted string is the same as the input string.

1.7 A.7 RC4

RC4 is an encryption cipher designed by Ron Rivest. This algorithm is fast and simple yet not secure according to today’s standards. We use BouncyCastle’s class RC4CoreEngine which has 150 lines with 7 Boolean perturbation points and 112 integer points.

Correctness Oracle: :

The considered scenario is decrypt(crypt(x)). The oracle asserts that the decrypted string is the same as the input string.

1.8 A.8 Canny

A canny filter is an edge detector in an image. We use the implementation of Tom Gibara.Footnote 7 There is one 1 class with 568 lines of code, with 450 integer perturbation points and 79 Boolean perturbation points.

Correctness Oracle: :

The oracle asserts that the detected edges are accurate of to the pixel with regards to the result of an unperturbed reference run.

1.9 A.9 LCS

We consider the Longest Common Sequence problem, implemented using dynamic programming.Footnote 8 As input, we use real RNA sequences of two plants: sativa and thaliana, extracted from the mature dataset of miRBase.Footnote 9 This implementation has 43 Lines with 9 Boolean perturbations point and 79 integer perturbation points.

Correctness Oracle: :

The oracle is that the output is the same as the one of the reference unperturbed implementation.

1.10 A.10 Laguerre

Laguerre is an numerical analysis program which computes the the roots of a polynomial equation. The implementation comes from The Apache Commons Mathematics Library.Footnote 10 The class under study is “LaguerreSolver” which is 440 lines long and has 176 interger perturbation points and 25 Boolean perturbation points.

Correctness Oracle: :

The oracle checks if the computed solution actually nullifies the equation. Because the computation acts on floating-point numbers, we accept the solution if its evaluation is within + / − 10−6.

1.11 A.11 Linreg

Linreg computes a linear regression using the Tikhonov regularization. We take the implementation from the Weka Library.Footnote 11 The class under study is “LinearRegression”: it has 188 lines of codes, with 75 integer perturbation points and 15 Boolean perturbation points. We generate inputs by randomly sampling the coefficients of the equation.

Correctness Oracle: :

It checks if the computed coefficients are equal to those obtained from a reference run, up to a 10−6 precision.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Danglot, B., Preux, P., Baudry, B. et al. Correctness attraction: a study of stability of software behavior under runtime perturbation. Empir Software Eng 23, 2086–2119 (2018). https://doi.org/10.1007/s10664-017-9571-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9571-8

Keywords

Navigation