A Testing Coverage Model Based on NHPP Software Reliability Considering the Software Operating Environment and the Sensitivity Analysis

: We have been attempting to evaluate software quality and improve its reliability. Therefore, research on a software reliability model was part of the e ﬀ ort. Currently, software is used in various ﬁelds and environments; hence, one must provide quantitative conﬁdence standards when using software. Therefore, we consider the testing coverage and uncertainty or randomness of an operating environment. In this paper, we propose a new testing coverage model based on NHPP software reliability with the uncertainty of operating environments, and we provide a sensitivity analysis to study the impact of each parameter of the proposed model. We examine the goodness-of-ﬁt of a new testing coverage model based on NHPP software reliability and other existing models based on two datasets. The comparative results for the goodness-of-ﬁt show that the proposed model does signiﬁcantly better than the existing models. In addition, the results for the sensitivity analysis show that the parameters of the proposed model a ﬀ ect the mean value function.


Introduction
Software combines technologies such as artificial intelligence (AI), Internet of Things (IOT), and big data, which are important elements in the fourth industrial revolution, to create new forms of value.Software is a very important factor in the fourth industrial revolution [1].Software reliability is defined as the probability that a software will run error free for a period of time.It is very difficult and complex to develop the new technologies and theories necessary to improve it.Therefore, the focus of software development is to improve the reliability and stability of software systems.In addition, unpredictable results occur during the software development process.Generally, the software development process consists of four stages: specification, design, coding, and testing [1].Software faults are detected and corrected in the testing phase, which is the final stage of software development.Because the number of software faults and the interval between faults has a significant impact on software stability, software failure prediction is an important area of study for software developers, enterprises, and research institutions.The software reliability model makes it easier to evaluate software reliability using the fault data collected in a test or live environment.Furthermore, the software reliability model can measure the number of software faults, fault interval, and reliability, and the fault detection rate can be estimated and variously predicted.Conversely, testing coverage is one of the most important issues not solved in the software development process, and it is important for both software development and the customers of software products.Testing coverage is a measure that enables software developers to evaluate the quality of the software tested and determine how much additional effort is needed to improve the reliability of the software [2,3].Testing coverage can provide customers with a quantitative confidence criterion when they plan to buy or use software products [4,5].In addition, the software is used in a variety of operating environments.Therefore, the uncertainty or randomness of the software operating environment must be considered.
In the past, various statistical models have been proposed for the purpose of evaluating software reliability.The model based on the non-homogeneous Poisson process (NHPP) proved to be a very successful approach for practical software reliability [6].On the basis of the NHPP, the mean value function can be used to obtain the expected number of faults to a certain point in time.Various NHPP software reliability models have been proposed to date.In the early days, most NHPP software reliability models were developed based on the assumption that faults detected in the testing phase were removed immediately with no debugging time delay, no new faults were introduced, and software systems used in the field environments were the same as or close to those used in the development-testing environment [7][8][9][10].In the mid-1990s, studies began on a software reliability model consistent with a variety of software operating environments due to rapid changes in the industrial structure and environment.In the early 2000s, researchers began to explore new approaches such as the application of calibration factors to the software reliability model considering the uncertainty of the operating environment [11][12][13].Recently, an NHPP software reliability model considering the uncertainty of the software operating environment has been proposed [14][15][16][17][18][19].In addition, many testing coverage functions have been proposed in terms of different distributions, and software reliability models based on different testing coverage function have also been developed [2,[20][21][22].
In this paper, we discuss a new testing coverage model based on NHPP software reliability with the uncertainty of operating environments and sensitivity analysis in order to study the impact of each parameter of the proposed model.We examine the goodness-of-fit of a new testing coverage model based on NHPP software reliability and other existing NHPP models based on two datasets.The explicit solution of the mean value function for a new testing coverage model is derived in Section 2. The various criteria for comparative model analysis are discussed in Section 3. Model analysis and results based on two actual datasets are discussed in Section 4. The impact of each parameter of the proposed model based on sensitivity analysis is discussed in Section 5. Finally, Section 6 presents the conclusions and remarks.

Testing Coverage Model based on NHPP Software Reliability
In this study, the basic assumption is to utilize the NHPP to describe the failure phenomenon during the testing phase, and the counting process N(t) of the NHPP represents the cumulative number of failures up to the point of execution time t.
where, λ(s) is the intensity function.

A General Testing Coverage Model based on NHPP Software Reliability
A general mean value function m(t) of the testing coverage models based on NHPP software reliability using the differential equation is as follows [2] where, a(t) is the total number of faults detected in the software by time t, c(t) is the testing coverage function, i.e., the percentage of the code coverage by time t, c (t) is the derivative of the testing coverage function.
Solving Equation (1) using different functions a(t) and c (t) 1−c(t) yields the following mean value function m(t) [2]: where, B(t) = t t 0 c (s) 1−c(s) ds and m(t 0 ) = m 0 is the marginal condition of Equation ( 2), with t 0 representing the start time of the testing process.

A General Testing Coverage Model Based on NHPP Software Reliability with the Uncertainty of the Operating Environments
A general mean value function m(t) of the testing coverage models based on NHPP software reliability using the differential equation considering the uncertainty of the operating environments is as follows [14]: We find the mean value function m(t) as in Equation ( 4) using the differential equation of Equation ( 3) by applying the random variable η to a probability density function in the uncertain operating environments [14].

A New Testing Coverage Model based on NHPP Software Reliability Considering the Uncertainty of the Operating Environments
In this paper, a new testing coverage model based on NHPP software reliability considering the uncertainty of the operating environments is presented.We apply Equations ( 3) and (4), which are assumptions of the existing test coverage model, and add the following assumptions [3,14]: The initial condition of the mean value function m(t) is m(0) = 0; a(t) = N is the expected number of faults that exist in the software before testing; η has a generalized probability density function g with two parameters α and β; The fault detection rate can be expressed by c (τ) We can derive the mean value function m(t) based on the assumptions and differential equations [16].
In this study, we consider a testing coverage function c(t) as follows: where, b is the failure detection rate and d represents the shape factor.
We obtain a new mean value function m(t) for the testing coverage model based on NHPP software reliability with uncertainty in the operating environment that can be used to determine the expected number of software failures detected by time t by substituting the function c(t) above into Equation (5):

Various Criteria for Comparative Model Analysis
We estimate the parameters of the NHPP software reliability models in Table 1 using the least squares estimation (LSE) method.We derived the parameters of the mean value function m(t) using Matlab (2016a, MathWorks, Natick, MA, USA) and R (Version 3.3.1)programs based on the LSE method.

Criteria for Model Comparison
The statistical significance of model comparisons can be confirmed through well-known criteria.We use twelve criteria to estimate the goodness-of-fit of all model and to compare the proposed model with other models in Table 1.
The mean squared error (MSE) measures the average of the squares of the errors that is the average squared difference between the estimated values and the actual data.The root mean square error (RMSE) is a frequently used measure of the differences between values predicted by a model or an estimator and the values observed.Akaike's information criterion (AIC) is used to compare the capability of each model to maximize the likelihood function (L), while considering the degrees of freedom [23].The variance measures the standard deviation of the prediction bias [19].The root mean square prediction error (RMSPE) is a measure of the closeness with which the model predicts the observation [19].The predictive ratio risk (PRR) measures the distance of the model estimates from the actual data over the model estimate [1].The Theil statistic (TS) is the average deviation percentage over all periods with regard to the actual values [24].The predictive power (PP) measures the distance of the model estimates from the actual data [1].The sum of absolute errors (SAE) measures the absolute distance of the model [16].The mean absolute errors (MAE) measures the deviation by using the absolute distance of the model [24].The R 2 measures how successful the fit is in explaining the variations in the data [4].The adjusted R 2 (Adj R 2 ) is a modification to the R 2 that accounts for the number of explanatory terms in a model relative to the number of data points [4].The smaller the value of these ten criteria, i.e., MSE, RMSE, AIC, variance, RMSPE, PRR, TS, PP, SAE, and MAE, the better the model fit is (close to 0).Conversely, the higher the value of these two criteria, i.e., R 2 and Adj R 2 , the better the model fit is (close to 1).These criteria are described as follows in Appendix A Table A1.
From Table A1, m(t i ) is the estimated cumulative number of failures at t i for i = 1, 2, • • • , n; y i is the total number of failures observed at time t i ; n is the total number of observations; and m is the number of unknown parameters in the model.

Distance of the Normalized Criteria
Pham [25] and Li and Pham [4] mentioned the distance of the normalized criteria (NCD) methods for ranking and selecting the best model from various software reliability models according to the characteristics of a set of criteria.We compare the performance of the software reliability models using the twelve criteria in Section 3.1.It is difficult to check the performance of the software reliability model due to the diversity in criteria, so a criteria method that integrates them is needed.It is shown that 10 out of 12 criteria show that the goodness-of-fit (similarity to actual data) improves as the value of the criteria is closer to 0, while for 2 of the 12 criteria, i.e., R 2 and Adj R 2 , being closer to 1 indicates a better goodness-of-fit.In addition, the weight of each criterion is not considered because each criterion is different.We have improved the NCD method according to the characteristics of the criteria proposed in Section 3.1.
Teng-Pham (TP) [14] The value of NCD is defined as follows: where, s is the total number of models; C ij denotes the value for the ith model of the jth criterion where i = 1, 2, . . ., s; d denotes the total number of criteria, e.g., MSE, RMSE, AIC, variance, RMSPE, PRR, TS, PP, SAE, and MAE; f denotes the total number of criteria, e.g., R 2 and Adj R 2 .Table 1 summarizes the mean value functions for NHPP software reliability models.

Confidence Interval
We use the following (7) to obtain the confidence interval [1] of the proposed model and of the existing NHPP software reliability models.
where, Z α/2 is 100(1 − α), the percentile of the standard normal distribution.It is possible to confirm whether the value of the mean value function is included in the confidence interval at each time point or not and how much the confidence interval actually contains the value.

Data and the Result of Model Analysis
We estimate the parameters on two datasets, find the results of all criteria and the NCD value, and compare the results against each other.

Dataset 1
Dataset #1 was given by [6] and provided in Table 2.In dataset #1, the week index ranges from 1 week to 21 weeks, and there are 38 cumulative failures at 14 weeks.Detailed information is recorded in [6].First, in Table 3, we obtained the parameter estimates for all twenty models and the values of all twelve criteria using t = 1, 2, • • • , 21 of the week index from dataset #1.As shown in Table 3, we can see that the proposed model achieves the best results when comparing the twelve criteria to the other models.In addition, as shown in Table 4, the proposed model achieves the best results when comparing the NCD value to the other models.Looking at Table 3 in detail, the MSE, RMSE, AIC, variance, RMSPE, PRR, TS, PP, and SAE values for the proposed model are the lowest as compared with all models.The R 2 and Adj R 2 values for the proposed model are the largest as compared with all models.First, the MSE value of the proposed model is 2.4340, which is the lowest value compared to the other models; next, the MSE value of the YID 1 model is 2.9149.The RMSE value of the proposed model is 1.5601, which is the lowest value among the models compared.Furthermore, the RMSE value of the YID 1 model is 1.7073.The AIC value of the proposed model is 60.2184, which is also the lowest value among the models compared, and the AIC value of the GO model is 62.8309.The variance value of the proposed model is 1.2993, which is the lowest value among the models compared, and the variance value of the YID 1 model is 1.5841.The RMSPE value of the proposed model is 1.2997, which is the lowest value among models compared, and the RMSPE value of the YID 1 model is 1.5883.The PRR value of the proposed model is 0.0709, which is the lowest value among the models compared, and the PRR value of the Vtub model is 0.4394.The TS value of the proposed model is 4.6588, which is the lowest value among the models compared, and the TS value of the YID 1 model is 5.6364.The PP value of the proposed model is 0.0599, which is the lowest value among the models compared, and the PP value of the DS model is 0.3575.The SAE value of the proposed model is 13.8037, which is the lowest value among models compared, and the SAE value of the PNZ model is 16.1109.The R 2 and Adj R 2 values of the proposed model are 0.9843 and 0.9744, respectively, which are the largest among the models compared.The R 2 and Adj R 2 values of the YID 1 model are 0.9970 and 0.9701, respectively.Although both the YID 1 and YID 2 models are slightly smaller than our proposed model based on the MAE criterion (i.e., the MAE value of YID 1 and YID 2 models are 1.4727 and 1.4741 as compared with our proposed model which is 1.5337), the performance of our proposed model, based on all the other remaining eleven criteria, is not only much better as compared with the YID 1 and YID 2 models but also as compared with all the models as shown in Table 3. Figure 1 shows a graph of the mean value functions for all models based on dataset #1. Figure 2 shows a graph of the relative error value of all models for dataset #1.From Table 4, looking at the results for the NCD values, the proposed model achieves the lowest values among all models.The NCD value of the proposed model is 0.0940763, which is the lowest value    From Table 4, looking at the results for the NCD values, the proposed model achieves the lowest values among all models.The NCD value of the proposed model is 0.0940763, which is the lowest value From Table 4, looking at the results for the NCD values, the proposed model achieves the lowest values among all models.The NCD value of the proposed model is 0.0940763, which is the lowest value among all the models, and the NCD values of the YID 1 model and the YID 2 model are 0.1245350 and 0.1254775, respectively.Figure 3a shows a 3D plot of the MSE value, model number and NCD value, and Figure 3b shows a 3D plot of the R 2 value, model number and NCD value for dataset #1.Table 5 lists the 95% and 99% confidence intervals (lower confidence limit (LCL) and upper confidence limit (UCL)) for the proposed model for dataset #1, respectively.Figure 4 shows a graph of the 95% and 99% confidence intervals for the proposed model.The relative value and the confidence interval graphs aim to confirm the ability to improve accuracy and whether the number returned by the mean value function is included in the confidence interval of each time point.

Dataset 2
Dataset #2 was provided by [34] and is shown in Table 6.In dataset #2, the weekly index uses cumulative system days, and there are 33 cumulative failures in 58,633 system days.The detailed information is recorded in [34].

Dataset 2
Dataset #2 was provided by [34] and is shown in Table 6.In dataset #2, the weekly index uses cumulative system days, and there are 33 cumulative failures in 58,633 system days.The detailed information is recorded in [34].In Table 7, we obtained the parameter estimates of all twenty models and the values of all twelve criteria using t = 1249, 4721, 8786, • • • , 58, 633 from the cumulative system days from dataset #2.As shown in Table 7, we can see that the proposed model achieves the best results when comparing the twelve criteria to the other models.As shown in Table 8, the proposed model also achieves the best results when comparing NCD values to the other models.
Looking at Table 7 in detail, we can observe that the MSE, RMSE, AIC, variance, RMSPE, PRR, TS, PP, and SAE values for the proposed model are the lowest values among all models compared.The R 2 and Adj R 2 values for the proposed model are the largest values among all the models compared.First, the MSE value of the proposed model is 0.8790, which is the lowest value among all the models compared, and the MSE value of the YID 2 model is 0.9229.The RMSE value of the proposed model is 0.9376, which is the lowest among all the models compared, and the RMSE value of the YID 2 model is 0.9607.The variance value of the proposed model is 0.7662, which is the lowest among all the models compared, and the variance value of the CT model is 0.7878.The RMSPE value of the proposed model is 0.7664, which is the lowest among all the models compared, and the RMSPE value of the CT model is 0.7889.The PRR value of the proposed model is 0.0149, which is the lowest among all the models compared, and the PRR value of the CT model is 0.0191.The TS value of the proposed model is 2.8923, which is the lowest among all the models compared, and the TS value of the CT model is 2.9636.The PP value of the proposed model is 0.0149, which is the lowest among all the models compared, and the PP value of the CT model is 0.0184.The SAE value of the proposed model is 7.6843, which is the lowest among all the models compared, and the SAE value of the CT model is 7.8468.The MAE value of the proposed model is 0.9605, which is the lowest among all the models compared, and the MAE value of the CT model is 0.9809.The R 2 and Adj R 2 values of the proposed model are 0.0.9937 and 0.9891, respectively, which are the largest among all the models compared.The R 2 and Adj R 2 values of the YID 1 model are 0.9933 and 0.9886, respectively.Although the YID 2 model is smaller than our proposed model based on AIC criterion, the performance of our proposed model based on all the other remaining eleven criteria are not only much better as compared with the YID 2 model but also as compared with all the models, as shown in Table 7. Figure 5 shows a graph of the mean value functions for all the models based on dataset #2. Figure 6 shows a graph of the relative error value of all models for dataset #2.Moreover, from Table 8, looking at the results for the NCD values, the proposed model achieves the lowest values among all the models compared.The NCD value of the proposed model is 0.0713073, which is the lowest among all the models compared, and the NCD values of the CT model and the YID 2 model are 0.0724187 and 0.0773360, respectively.Figure 7a shows a 3D plot of the MSE value, model number and NCD value, and Figure 7b shows a 3D plot of the R 2 value, model number and NCD value for dataset #2.Table 9 lists the 95% and 99% confidence intervals of the proposed model for dataset #2. Figure 8 shows a graph of the 95% and 99% confidence intervals of the proposed model for dataset #2.Moreover, from Table 8, looking at the results for the NCD values, the proposed model achieves the lowest values among all the models compared.The NCD value of the proposed model is 0.0713073, which is the lowest among all the models compared, and the NCD values of the CT model and the YID 2 model are 0.0724187 and 0.0773360, respectively.Figure 7a shows a 3D plot of the MSE value, model number and NCD value, and Figure 7b shows a 3D plot of the R 2 value, model number and NCD value for dataset #2.Table 9 lists the 95% and 99% confidence intervals of the proposed model for dataset #2. Figure 8 shows a graph of the 95% and 99% confidence intervals of the proposed model for dataset #2.

Sensitivity Analysis
In this study, we investigated the effect of each parameter of the proposed model on the mean value function by performing sensitivity analysis.We conducted the sensitivity analysis by changing one of the parameters of the model and fixing all the other parameters.We examined how the value of the estimated mean function changes when the estimated parameter values obtained from dataset #1

Sensitivity Analysis
In this study, we investigated the effect of each parameter of the proposed model on the mean value function by performing sensitivity analysis.We conducted the sensitivity analysis by changing one of the parameters of the model and fixing all the other parameters.We examined how the value of the estimated mean function changes when the estimated parameter values obtained from dataset #1 and dataset #2 change from -20% to +20%. Figure 9 shows sensitivity analysis for five parameters in the proposed model based on dataset #1.As can be seen in Figure 9, the cumulative number of detected failures will change with each estimated parameter.Figure 10 shows the sensitivity analysis for five parameters in the proposed model based on dataset #2.From Figure 10, it can be seen that the cumulative number of detected failures will be change with each estimated parameter.Therefore, it appears that the parameters are all influential in the proposed model.

Sensitivity Analysis
In this study, we investigated the effect of each parameter of the proposed model on the mean value function by performing sensitivity analysis.We conducted the sensitivity analysis by changing one of the parameters of the model and fixing all the other parameters.We examined how the value of the estimated mean function changes when the estimated parameter values obtained from dataset #1 and dataset #2 change from -20% to +20%. Figure 9 shows sensitivity analysis for five parameters in the proposed model based on dataset #1.As can be seen in Figure 9, the cumulative number of detected failures will change with each estimated parameter.Figure 10 shows the sensitivity analysis for five parameters in the proposed model based on dataset #2.From Figure 10, it can be seen that the cumulative number of detected failures will be change with each estimated parameter.Therefore, it appears that the parameters are all influential in the proposed model.

Conclusions
Software is used in a variety of areas and environments, driven by the needs of different consumers.That is why we must develop high-quality software and improve its reliability as well.However, in general, software is developed in a controlled testing environment.Therefore, we considered the uncertainty or randomness of the software operating environment, and testing coverage where quantitative confidence criteria for software products can be applied.In this paper, we discussed a new testing coverage model based on NHPP software reliability with the uncertainty of operating environments and provided the sensitivity analysis regarding the impact of each parameter of the proposed model.As a result, we provided twelve criteria and the NCD value to compare the goodnessof-fit for the proposed model with several existing NHPP software reliability models.As can be seen in the results of model analysis, the results show that the proposed model achieves significantly better goodness-of-fit than other models.In addition, we investigated the impact of each parameter of the proposed model on the mean value function by performing sensitivity analysis.

Conclusions
Software is used in a variety of areas and environments, driven by the needs of different consumers.That is why we must develop high-quality software and improve its reliability as well.However, in general, software is developed in a controlled testing environment.Therefore, we considered the uncertainty or randomness of the software operating environment, and testing coverage where quantitative confidence criteria for software products can be applied.In this paper, we discussed a new testing coverage model based on NHPP software reliability with the uncertainty of operating environments and provided the sensitivity analysis regarding the impact of each parameter of the proposed model.As a result, we provided twelve criteria and the NCD value to compare the goodness-of-fit for the proposed model with several existing NHPP software reliability models.As can be seen in the results of model analysis, the results show that the proposed model achieves significantly

Figure 1 .
Figure 1.Mean value functions for all models in Table1for dataset #1.

Figure 2 .
Figure 2. Relative error value for all models in Table1for dataset #1.

Figure 1 .
Figure 1.Mean value functions for all models in Table1for dataset #1.

Figure 2 .
Figure 2. Relative error value for all models in Table1for dataset #1.

Figure 2 .
Figure 2. Relative error value for all models in Table1for dataset #1.

Figure 4 .
Figure 4.The 95% and 99% confidence intervals for the proposed model in dataset #1.

Figure 4 .
Figure 4.The 95% and 99% confidence intervals for the proposed model in dataset #1.

Figure 9 .
Figure 9. Sensitivity analysis of the parameters of the proposed model for dataset #1 (a) parameter b; (b) parameter α; (c) parameter β; (d) parameter N; (e) parameter d.

Figure 9 .
Figure 9. Sensitivity analysis of the parameters of the proposed model for dataset #1 (a) parameter b; (b) parameter α; (c) parameter β; (d) parameter N; (e) parameter d.

Figure 10 .
Figure 10.Sensitivity analysis of the parameters of the proposed model for dataset #2 (a) parameter b; (b) parameter α; (c) parameter β; (d) parameter N; (e) parameter d.

Table 3 .
Results of model parameter estimation and criteria from dataset #1.

Table 4 .
Results of criteria and NCD values for comparison with dataset #1.

Table 5 .
Results of the 95% and 99% confidence intervals for the proposed model for dataset #1.

Table 5 .
Results of the 95% and 99% confidence intervals for the proposed model for dataset #1.

Table 7 .
Results of model parameter estimation and criteria from dataset #2.

Table 8 .
Results for criteria and NCD value for comparison with dataset #2.Figure 5. Mean value functions for all models inTable 1 for dataset #2.Relative error value for all models in Table 1 for dataset #2.

Table 8 .
Results for criteria and NCD value for comparison with dataset #2.Relative error value for all models in Table1for dataset #2.

Table 9 .
Results of the 95% and 99% confidence intervals for the proposed model for dataset #2.

Table 9 .
Results of the 95% and 99% confidence intervals for the proposed model for dataset #2.