Elsevier

Applied Soft Computing

Volume 13, Issue 1, January 2013, Pages 189-200
Applied Soft Computing

Hybrid intelligent systems for predicting software reliability

https://doi.org/10.1016/j.asoc.2012.08.015Get rights and content

Abstract

In this paper, we propose novel recurrent architectures for Genetic Programming (GP) and Group Method of Data Handling (GMDH) to predict software reliability. The effectiveness of the models is compared with that of well-known machine learning techniques viz. Multiple Linear Regression (MLR), Multivariate Adaptive Regression Splines (MARS), Backpropagation Neural Network (BPNN), Counter Propagation Neural Network (CPNN), Dynamic Evolving Neuro-Fuzzy Inference System (DENFIS), TreeNet, GMDH and GP on three datasets taken from literature. Further, we extended our research by developing GP and GMDH based ensemble models to predict software reliability. In the ensemble models, we considered GP and GMDH as constituent models and chose GP, GMDH, BPNN and Average as arbitrators. The results obtained from our experiments indicate that the new recurrent architecture for GP and the ensemble based on GP outperformed all other techniques.

Highlights

► In this paper, we propose novel recurrent architectures for Genetic Programming (GP) and Group Method of Data Handling (GMDH) to predict software reliability. ► Further, we extended our research by developing GP and GMDH based ensemble models, where GP, GMDH, BPNN and Average are taken as arbitrators, to predict software reliability. ► The results obtained from our experiments indicate that the new recurrent architecture for GP and the ensemble based on GP outperformed all other techniques.

Introduction

During the past decade increasing attention was focused on computer software. As computing systems became more numerous, more complex, and more deeply embedded in modern society, there is a pressing need for developing systematic approaches to software development and software maintenance. Hence, software engineering, which deals with all these aspects, has become an independent area of research and practice. As reported in Ref. [13], “The software crisis is a term used to describe the recurring system development problems where software problems cause the system to be late, over cost, and/or not responsive to user's needs and requirements”. Software engineering was focused on the tools, techniques, and methods necessary to solve the “software crisis.” Papers and reports abound with description of new methodologies and software tools developed in university and corporation laboratories. Much of this discussion has centered on what is sometime termed a “silver bullet” i.e. the tool or technique that can solve the software crisis.

Gibbs [13] proposed that such a silver bullet exists and it is a software engineering project management issue. A properly managed project, in a mature software engineering environment, managed by a competent manager, can repeatedly deliver a software system on time, within cost, and satisfactory to the user. Good project management environment can compensate when things go wrong, for example, adjust the delivery schedule under changing conditions, select appropriate people for the job at hand, provide clear and unambiguous direction, monitor progress and job completion status, and take appropriate actions when controlling metrics indicate plans are not being followed. Thus software reliability plays an important role in software project management.

Software engineering is inherently knowledge intensive. Software processes and products are human centered [30]. In an attempt to pursue fruitful developments it is beneficial to identify sources of such shortcomings. At the same time they dealt with high-level abstract concepts and result in constructs whose functioning is not governed by the requirement of the laws of physics. There is no direct use of ideas such as continuity that is very intuitive and helpful in the physical world. In software engineering, small changes to the requirements may result in far drastic and radical changes in a total cost of the overall project. There is no concept of time because software products do not wear out in contrast to physical systems that are subjected to such deterioration.

Reliability is probably the most important of the characteristics inherent in the concept “software quality”. Software reliability is defined as the probability that the software will work without failure for a specified period of time [37]. Software reliability is an important factor related to defects and faults. It differs from hardware reliability in that it reflects the design perfection, rather than manufacturing perfection. The principal factors that affect software reliability are (i) fault introduction, (ii) fault removal and (iii) the environment. Fault introduction depends primarily on the characteristics of the product and the development process. The characteristics of development process include software engineering technologies and tools used the level of experience of the personnel, volatility of requirements, and other factors. Failure discovery, in turn, depends on the extent to which the software has been executed and the operational profile. Because some of the foregoing factors are probabilistic in nature and operate over time, software reliability models have generally been formulated in terms of random processes in execution time.

In the past few years much research work has been carried out in software reliability and forecasting but no single model could capture software characteristics very accurately.

In this paper, we propose recurrent architecture for GP and GMDH and also ensemble models involving GP, GMDH as constituents and GP, GMDH, BPNN and Average as arbitrators in predicting software reliability. We compared their performance with that of some of the well-known machine learning techniques.

The rest of the paper is organized in the following manner. In Section 2, a brief review of the works carried out in area of software reliability prediction is presented. In Section 3, various stand-alone machine-learning techniques applied in this paper are briefly described. In Section 4, the experimental design followed in this paper is presented, while Section 5 presents our proposed methodology. It is followed by Section 6 that discusses the results and discussion. Finally, Section 7 concludes the paper.

Section snippets

Literature survey

Given the importance of software reliability in software engineering, its prediction becomes a very critical issue. Intelligent and soft computing techniques have been dominating in the last two decades. The recently published comprehensive state-of-the-art review [34] justifies this issue.

In recent years, neural networks (NN) have proven to be universal approximators by successfully modeling any non-linear continuous function with arbitrary degree of accuracy [5], [30], [52]. Many papers

Overview of the techniques applied

Here we present a brief overview of the machine learning, soft computing and statistical techniques that are employed in this paper. Since, BPNN, MLR are too popular to be overviewed here, rest of the techniques are presented here.

Experimental design

In our experiment, we followed general time series forecasting model because software reliability forecasting problem has only one dependent variable and no explanatory variables in their strict sense. The general time series can be presented asXt=f(X)where X is vector of lagged variables {xt−1, xt−2, …, xt−p}. Hence, the key to finding the solution to the forecasting problems is to approximate the function ‘f’. This can be done by iteratively adjusting the weights in the modeling process in the

Proposed method

An ensemble system is composed of several independently built models called base-level models or constituents. Each base-level model is developed differently by applying different intelligent techniques using a single set of training data points, or using a single technique that is applied to different subsets of the dataset. In this study, we followed the former approach. The prediction outcome of such a system is based on processing outputs coming from all base level models that are part of

Result and discussion

We conducted the experiments with the above-mentioned machine learning techniques on the software reliability datasets taken from Musa [35], [36] and Iyer and Lee [18] which are presented in Table 1, Table 2, Table 3. We used the open source tool Discipulus [7] for conducting experiments using GP and NeuroShell 2.0 (www.wardsystems.com) for conducting experiments using GMDH algorithms. For each technique, the appropriate parameters are tweaked to get the least NRMSE values computed using Eq. (5)

Conclusions

In this paper, we compare the NRMSE values obtained by different intelligent techniques with GP and GMDH in the stand alone mode. We found that GP and GMDH performed better results in terms of NRMSE values. Further, we extended our research by developing GP and GMDH based novel recurrent architectures and ensemble models to predict software reliability. In the ensemble models, we considered GP and GMDH as constituent models and chose GP, GMDH, BPNN and Average as arbitrators. We tested the

Acknowledgements

We are very thankful to Mr. Frank Francone to give us permission to use Discipulus Tool (Demo version) for conducting various experiments involving GP and reported in this paper and also to Dr. B. Igor Kuzmanovski, Asst. Professor, Sts. Cyril and Methodius, University, Skopje, Republic of Macedonia for providing us with the MATLAB code for implementing CPNN.

References (56)

  • J.M. Bates et al.

    The combination of forecasts

    Operations Research Quarterly

    (1969)
  • J.A. Benediktsson et al.

    Parallel consensual neural networks

    IEEE Transactions on Neural Networks

    (1997)
  • L. Breiman

    Bagging predictors

    Machine Learning

    (1996)
  • Discipulus tool:...
  • T. Dohi et al.

    Optional software release scheduling based on artificial neural networks

    Annals of Software Engineering

    (1999)
  • S.J. Farlow

    Self-Organizing Methods in Modeling: GMDH Type Algorithm

    (1984)
  • J.H. Friedman

    Multivariate adaptive regression splines (with discussion)

    Annals of Statistics

    (1991)
  • J.H. Friedman

    Stochastic gradient boosting

    Computational Statistics and Data Analysis

    (1999)
  • Y. Freund et al.

    Experiments with new boosting algorithm

  • W.W. Gibbs

    Software's chronic crisis

    The Scientific American

    (1994)
  • D.E. Goldberg

    Genetic Algorithm in Search, Optimization and Machining Learning

    (1989)
  • J. Hansen et al.

    Artificial intelligence and generalized qualitative response models: an empirical test on two audit decision-making domains

    Decision Science

    (1992)
  • R. Hecht-Nielsen

    Counter propagation networks

    Applied Optics

    (1987)
  • A.G. Ivakhnenko

    The GMDH. A rival of stochastic approximation

    Soviet Automatic Control

    (1968)
  • R.K. Iyer et al.

    Measurement-based analysis of software reliability

  • N. Karunanithi et al.

    The scaling problem in neural networks for software reliability prediction

  • N. Karunanithi et al.

    Prediction of software reliability using neural networks

    International Symposium on Software Reliability

    (1991)
  • N. Karunanithi et al.

    Prediction of software reliability using connectionist models

    IEEE Transactions on Software Engineering

    (1992)
  • Cited by (0)

    View full text