Modeling Software Fault-Detection and Fault-Correction Processes by Considering the Dependencies between Fault Amounts

: Many NHPP software reliability growth models (SRGMs) have been proposed to assess software reliability during the past 40 years, but most of them have focused on modeling the fault detection process (FDP) in two ways: one is to ignore the fault correction process (FCP), i.e., faults are assumed to be instantaneously removed after the failure caused by the faults is detected. However, in real software development, it is not always reliable as fault removal usually needs time, i.e., the faults causing failures cannot always be removed at once and the detected failures will become more and more difﬁcult to correct as testing progresses. Another way to model the fault correction process is to consider the time delay between the fault detection and fault correction. The time delay has been assumed to be constant and function dependent on time or random variables following some kind of distribution. In this paper, some useful approaches to the modeling of dual fault detection and correction processes are discussed. The dependencies between fault amounts of dual processes are considered instead of fault correction time-delay. A model aiming to integrate fault-detection processes and fault-correction processes, along with the incorporation of a fault introduction rate and testing coverage rate into the software reliability evaluation is proposed. The model parameters are estimated using the Least Squares Estimation (LSE) method. The descriptive and predictive performance of this proposed model and other existing NHPP SRGMs are investigated by using three real data-sets based on four criteria, respectively. The results show that the new model can be signiﬁcantly effective in yielding better reliability estimation and prediction.


Introduction
Software reliability has been viewed as the most significant factor to improve the reliability of safety-critical software systems. Many time-dependent SRGMs have been studied to determine the reliability measures for software over the past four decades [1][2][3][4]. Researchers have developed different models upon different assumptions. Some models make an assumption that once a failure is detected, the errors which cause the failure are immediately corrected and no new errors are brought in simultaneously (i.e., perfect debugging) [5]. Other models take into account an imperfect debugging [6,7], i.e., faults are not always perfectly removed, and new ones can be introduced as a by-product of the fault repair process. However, most of the existing models assume that faults will be instantaneously repaired after being detected. However, it is not realistic and in fact detected faults will become more and more difficult to be corrected as testing progresses. Therefore, it is of great importance to build software reliability models from the viewpoint of the fault correction process, i.e., give the same priority to modeling the fault correction process as the fault detection process.
Schneidewind first modeled the software correction process along with the software detection process by proposing a fault-correction model using a constant time delay in the fault-detection process [8]. Then Xie and Zhao extended Schneidewind's idea from a constant time delay to a time-dependent delay function [9]. Later, Schneidewind provided an extension of his original model by using a random variable for the time delay following an exponential distribution [10]. Xie et al. further proposed another distributed correction time model as to provide a more flexible modeling of correction processes [11,12] and Peng et al. incorporated a testing effort function and imperfect debugging into the time delay function [13]. Lo and Huang proposed a general framework for modeling software's fault detection and correction processes and showed that many existing SRGMs based on NHPP could be covered by the proposed approaches [14]. Shu proposed a model from the viewpoint of the fault amount relationship between the two processes [15]. Additionally, some other attempts have been made to model these two processes from different viewpoints, such as Markov chain [16][17][18], finite and infinite server queuing models [19], and quasi-renewal time-delay fault removal model [20]. Researches also suggest that the estimation accuracy of SRGMs could be further improved by considering the influence of some real issues happening during the testing process [21,22], such as testing coverage. Testing coverage is a promising indicator for testing completeness and effectiveness, which can help developers evaluate how much test effort has been spent and help customers estimate the confidence of accepting the software product. Many time-dependent test coverage functions (TCFs) have been proposed by using different distributions, such as logarithmic-exponential [23], S-shaped [24], Rayleigh [25], Weibull and logistic [26] and lognormal [27]. Many TCFs based reliability models have been developed to formulate the relationship between the testing coverage and the number of detected faults, such as the Rayleigh model [25], logarithmic-exponential model [23], beta model, hyper-exponential model [22] and so on [22,24,26].
Therefore, it is of great importance to model dual fault detection and correction processes. In contrast to the existing research that considers the time dependency between fault detection and fault correction processes, in this paper, we will propose a new software reliability model considering both fault detection and correction processes from the viewpoint of fault content, that is, the quantitative dependence between the number of faults detected by the fault detection process and the number of faults corrected by the fault correction process. The fault introduction rate and test coverage are also considered to improve the accuracy of the final derived model. The remainder of this paper is organized as follows. In Section 2, we first give a brief overview of the fault-detection models' assumptions and fault-correction process, then build a relationship between the numbers of detected faults and corrected faults, after which we present the proposed model incorporating the fault introduction rate and testing coverage rate, and several existing SRGMs are also presented. In Section 3, we examine the fitting and prediction performance of this model on three sets of software failure data compared with other existing SRGMs. Finally, Section 4 gives the conclusions.

Basic Assumptions of Existing NHPP SRGMs
NHPP is used to describe the failure phenomenon during the testing process. The counting process {N(t), t ≥ 0} of an NHPP process is given as follows.
The mean value function (MVF) m(t) can be expressed as follows.
where λ(s) is the fault intensity function.
Most existing SRGMs based on NHPP have the following basic assumptions concerning the software fault detection process:
The software failure intensity at any time is proportional to the number of remaining faults presented at that time; 3.
The detected faults are immediately removed with certainty and correction of faults takes only negligible time.
According to the above assumptions, the general NHPP model can be proposed by solving the following equation: where a(t) is the total fault content function, m(t) is the mean number of detected faults and b(t) represents the fault detection rate.

Considering the Fault-Detection Process and Fault-Correction Process Together
Most existing SRGMs only focus on describing the behavior of the fault detection process and assume that faults will be fixed instantaneously upon detection, but actually most latent software faults may remain uncorrected for a long time due to the complexity of software systems and incomplete comprehension of software by the testers and learning process even after they are detected.
Suppose m c (t) denotes the mean value of corrected faults by time t, and assume that the mean number of faults corrected in the time interval (t, t + ∆t) is proportional to the mean number of detected but not yet corrected faults remaining in the software system. The MVF m c (t) can be expressed in terms of the following equations: where µ(t) is the fault correction rate, and m(t) is shown in Equation (3).

The Relationship between m(t) and m c (t)
Here we use data collected from testing a software program (Data Set 1, DS-1) [28] to study the relationship between m(t) and m c (t). Let r(t) = m c (t) m(t) , the cumulative detected faults and corrected faults are shown in Figure 1 and the value of r(t) is shown in Figure 2. Apparently, at the beginning of the testing phase, lots of faults are detected and most of them are simple and easy to be removed, so the difference between the number of corrected faults and detected faults is small, then faults detected become more complicated and difficult to be removed, so the difference becomes larger, then the difference becomes less again. Thus, a concave or S-shaped function can be used to model the ratio of the number of corrected faults to the number of detected faults.
Here we use two S-shaped functions to model r(t). From Table 1, we can see that r(t) = 1 1+βe −bt provides a better descriptive power.

A New Model with Imperfect Debugging and Testing Coverage
Here we incorporate testing coverage and fault introduction rate into software reliability model.
Suppose c(t) denotes the proportion of the code covered by time t against the whole code. Obviously, c(t) increases with testing time t. Firstly, when testing starts, c(t) grows at a quick rate as more test cases are executed to examine the software; after a certain point in time, the software becomes stable and less testing coverage take place to realize the residual fault detection, and function c(t) becomes flat when the testing comes to the end. Thus, a concave or S-shaped function can be suitable to model the testing coverage function. Obviously, (1 − c(t)) is the percentage of the software code which has not yet been covered by test cases up to time t. The derivative of the testing coverage function, c (t), represents the coverage rate. Therefore, c (t)/(1 − c(t)) could be used to measure the fault detection rate b(t), which is shown in Equation (3).
To build a model incorporating fault-detection process and fault-correction process as well as fault introduction rate and testing coverage, the following assumptions are proposed for this model: The software failures' occurrence follows an NHPP process.

2.
The software failure rate at any time depends on both the fault detection rate and the number of remaining faults in the software at that time.

3.
The fault detection rate can be expressed by is the percentage of the code that has been examined up to time t, c (t) is the derivative of the testing coverage function and represents the coverage rate.

4.
Faults can be introduced during the debugging phase with a constant fault introduction rate and the overall fault content function is linear time-dependent. 5. m c (t) denotes the mean value of corrected faults by time t, which is proportional to the mean number of detected but not yet corrected faults remaining in the software system, and r(t) represents the relationship between m(t) and m c (t) expressed by is the cumulative detected faults. From Assumption 4, the total fault number function a(t), is a linear function of the expected fault number detected up to time t. That is, where a denotes the initial fault number presented in the software system before testing starts and α > 0. Substituting a(t) from Equation (5) into Equation (3), and solving it in terms of the initial condition that at t = 0, m(t) = 0, we can obtain where c(0) refers to the testing coverage function when t = 0. From Assumption 5, once m(t) is determined, m c (t) can be achieved. That is, Substituting different types of testing coverage functions for c(t) and ratio functions of corrected faults number to detected faults number for r(t) in Equation (7), we can obtain different MVFs corresponding to them. As mentioned above, the testing coverage function should be a non-negative and non-decreasing function of testing time t. So the following function can be used to model the testing coverage function, that is: where A denotes the maximum percentage of testing coverage, r is the shape parameter and c is the scale parameter. Clearly, when t = 0, c(0) = 0. According to the results of Section 2.3, here the following function is used to describe r(t): Substituting Equations (8) and (9) into Equation (7), we can get the MVF of corrected faults correspondingly: It can be seen that fault detection process and correction process as well as fault introduction rate and testing coverage are all integrated into the proposed model. Table 2 lists the existing NHPP models to depict the MVF of fault correction process [14] and fault detection process [24] as well as the proposed new model. All models together will be used for the following comparisons. Table 2. Software reliability models and their MVFs.

No.
Model Name Model Type MVF(m(t) or m c (t)) 4 Inflection S-shaped [29] Concave 1+βe −bt 5 Yamada exponential [30] Concave Yamada Weibull [30] Concave and S-shaped 14 P-Z coverage model (2003) [24] S-shaped and Concave Here we use three criteria to judge the performance of the models. The first criterion is the mean value of squared error (Mean Square Error, MSE), which can be calculated as follows: where n represents the number of observations, y i represents the total number of faults observed by time t i ,m(t i ) denotes the estimated cumulative number of faults up to time t i , N represents the number of parameters in the model. Therefore, the lower the value of MSE, the better the model performs.
The second criterion is correlation index of the regression curve equation (R 2 ), which can be expressed as follows: where y = 1 n n ∑ i=1 y i . Therefore, the larger R 2 , the better the model explains the variation in the data.
The last criterion is adjusted R 2 (Adjusted R 2 ), which can be expressed as follows: where R denotes the value of R 2 and M represents the model's predictor number. Therefore, the larger value of Adjusted R 2 , the better is the model's performance.

Criteria for Models' Predictive Power Comparison
Here we use SSE criterion to examine the predictive power of SRGMs. SSE is the sum of squared error, which is expressed as follows: Assume that by the end of testing time t n , totally y n faults have been detected. Firstly we use the data points up to time t m−1 (t m−1 <t n ) to estimate the parameters of m(t), then substituting the estimated parameters in the mean value function yields the prediction value of the cumulative fault numberm(t m ) by t m (t m <t n ), y m is the actual number of faults detected by t m . Then the procedure is repeated for several values of t i (i = m + 1,m + 2,...,n.) until t n .
Therefore, the less SSE, the better is the model's performance.

Parameter Estimation Method
Once the analytical expression for m(t) is derived, the parameters in m(t) can be estimated by using the maximum likelihood estimation (MLE) method or the least square estimation (LSE) method. Though estimates from MLE are consistent and asymptotically normally distributed as the sample size increases, sometimes the estimations may not be obtained especially under conditions where m(t) is too complex. Here we turn to LSE methods to estimate the models' parameters.
The sum of the squared distance is given as follows: where y i is the cumulative number of faults detected or corrected in time (0, t i ), and all failure data are denoted in the form of pairs (t i ,y i ) (i = 1, 2, . . . , n;0 < t 1 < t 2 < · · · < t n ). By taking derivatives of (15) with respect to each parameter, and setting the results equal to zero, we can obtain several equations for the proposed model as follows: After solving the above equations simultaneously, we can obtain the least square estimates of all parameters for the proposed model.
As noted, solutions of the above Equation (16) are extremely difficult and require either graphical or numerical methods. Under the help of MATLAB, the calculation of the parameters is not a critical problem, though adding additional parameters to make the software reliability model more complex makes the work of parameter estimation more difficult.

A Middle-Size Software System Data
Here we examine the performance of the proposed model and compare it with several traditional models using data collected from testing a middle-size software system (Data Set 1, DS-1) [11]. The failure data are recorded by week and are shown in Table 3. In contrast to a traditional software reliability data set, this data set includes not only fault-detection data but also fault-correction data. We use all data points to fit the models and get the parameters estimation of all models. The model parameters, MSE values, R 2 values and Adjusted R 2 values are listed in Table 4.  All models' fitting results for DS-1 are graphically illustrated in Figure 3. Figure 4 is three dimensional and the coordinates X, Y and Z illustrate the values of MSE, R 2 and Adjusted R 2 respectively. We can see that the proposed model has the best criteria values of MSE, R 2 and Adjusted R 2 , i.e., the new model shows the best fitting power to the real data set than all other models.

Tandem Computer Data
Here we examine models using another software's testing data collected from Tandem Computers Release #1 (Data Set 3, DS-3) [5]. The failure data are recorded by week and are shown in Table 7. We use all data points to fit the models and estimate the parameters in the models. The fault data set also only has the detected fault number. The model parameters, MSE values, R 2 values and Adjusted R 2 values for goodness-of-fit are listed in Table 8.

Comparison of Models' Predictive Power
For the predictive power comparison, we divide the data set into two parts, 80% and the remaining 20%, respectively. We use the first 80% of data points to estimate the models' parameters, then use the remaining data points to compare the models' predictive power. The SSE values for the prediction are 37.0169, 236.0253 and 23.0348, accordingly shown in Table 9. For DS-1, comparing all models, we find that the new model has the smallest SSE value of 37.0169, which are smaller than the values of other models, e.g., others' SSE values can be 1.87-times (P-Z coverage model's 69.0496) and even 22.24-times (Yamada imperfect (1) model's 823.0770) larger than the value of the proposed model. For DS-2, comparing all models, we find that the new model has the smallest SSE value of 236.0253, which are smaller than the values of other models, e.g., others' SSE values can be 11.75 times (Inflection S-shaped model's 2.7742 × 10 3 ), even 1468.78 times (Yamada Rayleigh model's 3.4667 × 10 5 ) larger than the value of the proposed model. For DS-3, though SSE = 23.0348 for the proposed model is not the best result, it is only a little larger than the best result, for the smallest SSE value is 6.9655 (given by the delayed S-shaped model), and comparing to other models, we find that the new model's SSE value is much smaller than the values of other models, e.g., others' SSE values can be 3.16-times (Yamada Rayleigh model's 73.4246) and even 555.33-times (SRGM-3 model's 1.2792 × 10 4 ) larger than the value of the proposed model. Additionally, for the delayed S-shaped model, it only provides the best result for DS-3, but does not provide the best results for DS-1 and DS-2. This may indicate that this proposed model gives a better predictive power.

Conclusions
In this paper, the problem of modeling software fault-detection processes and faultcorrection processes together with imperfect debugging and testing coverage has been investigated. From the viewpoint of fault amount instead of fault correction time delay, a new SRGT model addressing the fault re-introduction and testing coverage is put forward by introducing the relationship function between the MVF of detected faults and corrected faults. The proposed model is applied to two kinds of data sets: one that contains information not only of detected fault numbers but also corrected fault numbers, and another that contains only detected fault numbers. No matter what kind of data sets, the proposed model gives significantly better goodness-of-fit and prediction results comparing with other existing NHPP models for three real data sets according to four criteria. It should be noted that, though adding more parameters makes the software reliability model more complicated and the task of parameter estimation more difficult, by automating the calculations using software tools it is not a critical problem. Funding: This research was partly funded by "National Key Laboratory of Science andTechnology on Reliability and Environmental Engineering of China", grant number "WDZC2019601A303".