NHPP Software Reliability Model with Inﬂection Factor of the Fault Detection Rate Considering the Uncertainty of Software Operating Environments and Predictive Analysis

: The non-homogeneous Poisson process (NHPP) software has a crucial role in computer systems. Furthermore, the software is used in various environments. It was developed and tested in a controlled environment, while real-world operating environments may be di ﬀ erent. Accordingly, the uncertainty of the operating environment must be considered. Moreover, predicting software failures is commonly an important part of study, not only for software developers, but also for companies and research institutes. Software reliability model can measure and predict the number of software failures, software failure intervals, software reliability, and failure rates. In this paper, we propose a new model with an inﬂection factor of the fault detection rate function, considering the uncertainty of operating environments and analyzing how the predicted value of the proposed new model is di ﬀ erent than the other models. We compare the proposed model with several existing NHPP software reliability models using real software failure datasets based on ten criteria. The results show that the proposed new model has signiﬁcantly better goodness-of-ﬁt and predictability than the other models.


Introduction
The core technologies of the fourth industrial revolution, such as artificial intelligence (AI), big data, the Internet of Things (IoT), are implemented in software, and software is essential as a mediator to create new values by fusing these technologies in all industries. As the importance and role of software in a computer system keep growing, a fatal software error can cause significant damage. For the effective operation of software, it is imperative to reduce the possibilities of software failures and maintain high levels of reliability. Software reliability is defined as the probability that the software will run without a fault for a certain period. It is vital for developing skills and theories to improve the software reliability. However, the development of a software system is a difficult and complex process. Therefore, the main focus of software development is on improving the reliability and stability of a software system. The number of software failures and the time interval of each failure have a significant influence on the reliability of software. Therefore, the prediction of software failures is a research field that is important not only for software developers, but also for companies and research institutes. Software reliability models can be classified according to the applied software development cycle.
Before the testing phase, a reliability prediction model is used that predicts reliability using information such as past data or language used, development domain, complexity, and architecture. After the test phase, a software reliability model is used, which is a mathematical model of software failures such as the frequency of failures and failure interval times. A model makes it easier to evaluate the software reliability using the fault data collected in the test or operating environment. In addition, the model can measure the number of software failures, software failure interval, and software reliability; and failure rate can be estimated and variously predicted.
Although various types of software reliability models have been studied, software defects and failures generally do not occur at the same time intervals. Based on this, a non-homogeneous Poisson process (NHPP) software reliability model was developed. The NHPP models determine mathematically handled software reliability; they are used extensively because of their potential for various applications. Most of the previous NHPP software reliability models were developed based on the assumptions that faults detected in the testing phase are removed immediately with no debugging time delay, new faults are not introduced, and software systems used in the field environments are the same as or close to those used in the development-testing environment. Based on this, Goel and Okumoto [1] presented a stochastic model for the software failure phenomenon using an NHPP; this model describes the failure observation phenomenon by an exponential curve. Also, there have been some other software reliability models that describe either S-shaped curves or a mixture of exponential and S-shaped curves [2][3][4]. As the Internet became popular in the mid-1990s, due to rapid changes in industrial structure and environment, a software reliability model with a variety of operating environments begun to be studied. In early 2000, considering the uncertainty of the operating environment, researchers began to try new approaches such as the application of calibration factors [5][6][7]. Based on this, Teng and Pham [8] generalized the software reliability model considering the uncertainty of the environment and its effects upon software failure rates. Recently, Inoue et al. [9] proposed the software reliability model with the uncertainty of testing environments. Li and Pham [10,11] proposed NHPP software reliability models considering fault removal efficiency and error generation, and the uncertainty of operating environments with imperfect debugging and testing coverage. Song et al. [12][13][14][15] studied NHPP software reliability models with various fault detection rates considering the uncertainty of operating environments. Zhu and Pham [16] proposed an NHPP software reliability model with a pioneering idea by considering software fault dependency and imperfect fault removal. However, previous NHPP software reliability models [1][2][3][4][17][18][19][20][21][22][23][24][25] did not take into account the uncertainty of the software operating environment, and did not consider the learn-curve in the fault detection rate function [8][9][10][11]13,14,26,27].
In this paper, we discuss a new model with inflection factor of the fault detection rate function considering the uncertainty of operating environments, and the predictive analysis. We examine the goodness-of-fit and the predictability of a new software reliability model and other existing NHPP models based on several datasets. The explicit solution of the mean value function for the new software reliability model is derived in Section 2. Criteria for model comparisons, prediction, and selection of the best model are discussed in Section 3. Model analysis and results through numerical examples are discussed in Section 4. Section 5 presents conclusions and remarks. Pr N(t) = n = m(t) n n! exp −m(t) , n = 0, 1, 2, 3 . . . .

NHPP Software Reliability Modeling
Assuming that m(t) is a mean value function, the relationship between the mean value function m(t) and the intensity function λ(t) is A general mean value function m(t) of NHPP software reliability models using the differential equation is as follows [19]: Solving Equation (1) by using different functions a(t) and b(t), the following mean value function m(t) is observed [19], where B(t) = t t 0 b(s)ds, and m(t 0 ) = m 0 is the marginal condition of Equation (2), with t 0 representing the start time of the testing process.

A New NHPP Software Reliability Model
A general mean value function m(t) of NHPP software reliability models using the differential equation considering the uncertainty of operating environments is as follows [26]: where m(t) is the mean value function, b(t) is the fault detection rate function, N is the expected number of faults that exist in the software before testing, and η is a random variable that represents the uncertainty of the system fault detection rate in the operating environments with a probability density function [26], We find the following mean value function m(t) using the differential equation by applying the random variable η; it has a generalized probability density function with two parameters α ≥ 0 and β ≥ 0, where the initial condition m(0) = 0: In this paper, we consider a fault detection rate function b(t) to be as follows: where b is the failure detection rate and a represents the inflection factor. We obtain a new mean value function m(t) of NHPP software reliability model subject to the uncertainty of operating environments that can be used to determine the expected number of software failures detected by time t by substituting the function b(t) in Equation (5): In this paper, the advantages of the proposed new model take into account the learn-curve in the fault detection rate function and the uncertainty of the operating environments.

Parameter Estimation and Models for Comparison
Many NHPP software reliability models use the least square estimation (LSE) and the maximum likelihood estimation (MLE) methods to estimate the parameters. However, if the expression of the mean value function m(t) of the software reliability model is too complicated, an accurate estimate may not be obtained from the MLE method. Here, we derived the parameters of the mean value function m(t) using the Matlab and R programs based on the LSE method. Table 1 summarizes the mean value functions of existing NHPP software reliability models and the proposed new model; among them, NHPP software reliability models 18,19, and 20 consider the uncertainty of the environment. Table 1. Software reliability models.

Criteria for Model Comparison
We use ten criteria to estimate the goodness-of-fit of the proposed model, and use one criterion to compare the predicted values.
(1) Mean squared error (MSE) The MSE measures the average of the squares of the errors that is the average squared difference between the estimated values and the actual data.
(2) Root mean square error (RMSE) The RMSE is a frequently used measure of the differences between values predicted by a model or an estimator and the values observed.
(3) Predictive ratio risk (PRR) [22] The PRR measures the distance of the model estimates from the actual data against the model estimate.
(4) Predictive power (PP) [22] The PP measures the distance of the model estimates from the actual data.
(5) Akaike's information criterion (AIC) [28] AIC = −2logL + 2m AIC is measured to compare the capability of each model in terms of maximizing the likelihood function (L), while considering the degrees of freedom. L and log L are given as follows: The R 2 measures how successful fit is in explaining the variation of the data. (7) Adjusted R-square (Adj R 2 ) [10] The Adjusted R 2 is a modification to R 2 that adjusts for the number of explanatory terms in a model relative to the number of data points.
(8) Sum of absolute errors (SAE) [13] The SAE measures the absolute distance of the model.
The MAE measures the deviation by the use of absolute distance of the model.
The variance measures the standard deviation of the prediction bias, where Bias is given as: (11) Sum of squared errors for predicted value (Pre SSE) [11] Pre SSE = n i=k+1 (m(t i ) − y i ) 2 We use the data points up to time t k to estimate the parameters of the mean value function m(t), then measure the square of the error between the estimated value and the actual data after the time t k , obtained by substituting the estimated parameter into the mean value function.
Here,m(t i ) is the estimated cumulative number of failures at t i for i = 1, 2, · · · , n; y i is the total number of failures observed at time t i ; n is the total number of observations; m is the number of unknown parameters in the model.
The smaller the value of these nine criteria, i.e., MSE, RMSE, PRR, PP, AIC, SAE, MAE, Variance, and Pre SSE, the better is the fit of the model (closer to 0). On the other hand, the higher the value of the two criteria, i.e., R 2 and Adj R 2 , the better is the fit of the model (closer to 1).

Confidence Interval
It is possible to check whether the value of the mean value function is included in the confidence interval at each point, t i , or not and how much the confidence interval actually contains the value. We use the following Equation (7) to obtain the confidence interval [22] of the proposed new model and existing NHPP software reliability models; where Z α/2 is 100(1 − α), the percentile of the standard normal distribution.

Data Information
Datasets #1 and #2 were reported by [22] based on system test data for a telecommunication system. Both, the automated and human-involved tests are executed on multiple test beds. The system records the cumulative of faults by each week. In Datasets #1 and #2, the week index is from week 1 to 21, and there are 26 and 43 cumulative failures in 21 weeks, respectively. Detailed information can be seen in [22]. Datasets #3, #4, and #5 were reported by [22] based on the on-line communication system. Here as well, the system records the cumulative of faults by each week. In Datasets #3, #4, and #5, the week index is from week 1 to 12, and there are 26, 55, and 55 cumulative failures in 12 weeks, respectively. Detailed information can be seen in [22].

Results of the Estimated Parameters
Tables 2-6 summarize the results of the estimated parameters using the LSE technique and the values of the ten criteria (MSE, RMSE, PRR, PP, AIC, R 2 , Adj R 2 , SAE, MAE, and Variance) of all 21 models in Table 1. First, for comparison of the goodness-of-fit, we obtained the parameter estimates and the criteria of all models using all data sets; when t = 1, 2, · · · , 21 from Dataset #1 and #2, and when t = 1, 2, · · · , 12 from Dataset #3, #4 and #5. As shown in Tables 2-6, we can see that the proposed new model has the best results when comparing the ten criteria to the other models.
As can be seen from Table 2, the MSE, RMSE, PRR, SAE, MAE, and Variance values for the proposed new model are the lowest values compared to all models in Table 1. The MSE value of the proposed new model is 0.5864, which is smaller than the value of MSE of other models. The RMSE value is 0.7658, PRR value is 0.5024, SAE value is 11.3783, MAE value is 0.7111, and Variance value is 0.6903, which are smaller than the corresponding values of other models. The R 2 and Adj R 2 values for the proposed new model are the largest values as compared to all models. The R 2 value of the proposed model is 0.9947, and the Adj R 2 value is 0.9929, which are larger than the corresponding values of other models.
From Table 3, we can see that the MSE, RMSE, PRR, PP, SAE, MAE, and Variance values for the proposed new model are the lowest values in comparison with every model in Table 1. The MSE value of the proposed new model is 0.8470, which is smaller than that of other models. The RMSE value is 0.9203, PRR value is 0.1159, PP value is 0.1355, SAE value is 14.0367, MAE value is 0.8773, and Variance value is 0.8232, which are smaller than the corresponding values of other models. The AIC value is 77.0423, which is the second lowest value. The R 2 and Adj R 2 values for the proposed new model are the largest values compared to all models. The value of R 2 for the proposed model is 0.9970 and the Adj R 2 is 0.9960, which are larger than the respective values of other models.
As can be seen from      Figures 6-10 show graphs of the 95% confidence interval of the proposed new model, which serve to confirm whether the value of the mean value function is included in the confidence interval of each time point. Figures 11-15 show graphs of the relative error value of all models, which serve to confirm its ability to provide better accuracy. Figures 1-5 show graphs of the mean value functions for all models based on Datasets #1-#5, respectively. Figures 6-10 show graphs of the 95% confidence interval of the proposed new model, which serve to confirm whether the value of the mean value function is included in the confidence interval of each time point. Figures 11-15 show graphs of the relative error value of all models, which serve to confirm its ability to provide better accuracy.     Figures 6-10 show graphs of the 95% confidence interval of the proposed new model, which serve to confirm whether the value of the mean value function is included in the confidence interval of each time point. Figures 11-15 show graphs of the relative error value of all models, which serve to confirm its ability to provide better accuracy.

Prediction Analysis
In this paper, we use Dataset #1 and #2 to compare how the predicted values of each model are different to fulfill the objective of this paper. We compare the goodness-of-fit of all models by using up to 75% of the dataset and compare the predicted value of all models using the remaining 25% of dataset. For comparison of the goodness-of-fit, we obtained the parameter estimates and the criteria (MSE, RMSE, PRR, PP, AIC, R , Adj R , SAE, MAE, and Variance) for all models when = 1, 2, ⋯ ,16, and, for comparison of the predicted value, we obtained the PreSSE value of all models when = 17, 18, ⋯ ,21 from Dataset #1 and #2.
First of all, as seen in Tables 7 and 8 for comparison of the goodness-of-fit, it is evident that the proposed new model has the best results when comparing the ten criteria with the other models. As seen from Table 7, the MSE, RMSE, PRR, SAE, and Variance values for the proposed new model are the lowest values compared to all models in Table 1. The MSE value of the proposed new model is 0.5915, which is smaller than the corresponding value of other models. The RMSE value is 0.7691, the PRR value is 0.3380, the SAE value is 8.2769, and the Variance value is 0.6591, which are smaller than the value of other models. The R and Adj R values for the proposed new model are the largest values compared to all models. The value of R for the proposed model is 0.9923 and that of Adj R is 0.9885, which are larger than the respective values of other models. As seen from Table 8, Tables 7 and 8 for the comparison of the predicted value, it is evident that the proposed new model has the best results when comparing the criterion of PreSSE with the other models. As it can be seen from Table 7, the PreSSE value for the proposed new model is the lowest value as compared to all models in Table 1. The PreSSE value of the proposed new model is 2.6780, which is smaller than that of the other models. The PreSSE value of the proposed new model is 8.6532, which is smaller than the value of PreSSE of other models in Table 8. Figures 16 and 17 show graphs of the goodness-of-fit and prediction of mean value functions for all models from Datasets #1 and #2, respectively.

Prediction Analysis
In this paper, we use Dataset #1 and #2 to compare how the predicted values of each model are different to fulfill the objective of this paper. We compare the goodness-of-fit of all models by using up to 75% of the dataset and compare the predicted value of all models using the remaining 25% of dataset. For comparison of the goodness-of-fit, we obtained the parameter estimates and the criteria (MSE, RMSE, PRR, PP, AIC, R 2 , Adj R 2 , SAE, MAE, and Variance) for all models when t = 1, 2, · · · , 16, and, for comparison of the predicted value, we obtained the PreSSE value of all models when t = 17, 18, · · · , 21 from Dataset #1 and #2.
First of all, as seen in Tables 7 and 8 for comparison of the goodness-of-fit, it is evident that the proposed new model has the best results when comparing the ten criteria with the other models. As seen from Table 7, the MSE, RMSE, PRR, SAE, and Variance values for the proposed new model are the lowest values compared to all models in Table 1. The MSE value of the proposed new model is 0.5915, which is smaller than the corresponding value of other models. The RMSE value is 0.7691, the PRR value is 0.3380, the SAE value is 8.2769, and the Variance value is 0.6591, which are smaller than the value of other models. The R 2 and Adj R 2 values for the proposed new model are the largest values compared to all models. The value of R 2 for the proposed model is 0.9923 and that of Adj R 2 is 0.9885, which are larger than the respective values of other models. As seen from Table 8, Tables 7 and 8 for the comparison of the predicted value, it is evident that the proposed new model has the best results when comparing the criterion of PreSSE with the other models. As it can be seen from Table 7, the PreSSE value for the proposed new model is the lowest value as compared to all models in Table 1. The PreSSE value of the proposed new model is 2.6780, which is smaller than that of the other models. The PreSSE value of the proposed new model is 8.6532, which is smaller than the value of PreSSE of other models in Table 8. Figures 16 and 17 show graphs of the goodness-of-fit and prediction of mean value functions for all models from Datasets #1 and #2, respectively.

Conclusions
The software is used in a variety of environments; however, it is typically developed and tested in a controlled environment. The uncertainty of the operating environment is considered because the environment in which the software is operated varies. Therefore, we consider the uncertainty of the operating environment and the learn-curve in the fault detection rate function. In this paper, we discussed a new model with inflection factor of the fault detection rate function considering the uncertainty of operating environments and analyzed how the predicted values of the proposed new model are different than the other models. We provided numerical proof by goodness-of-fit and also predicted the values for all models, and compared the proposed new model with several existing NHPP software reliability models based on eleven criteria (MSE, RMSE, PRR, PP, AIC, R 2 , Adj R 2 , SAE, MAE, Variance, and Pre SSE). As shown with the numerical examples, the results prove that the proposed new model has significantly better goodness-of-fit and predicts the value better than the other existing models. Future work will involve broader validation of this conclusion based on recent data sets. In addition, we need to apply Bayesian and big-data estimation method to estimate parameters, and also need to consider the multi-release point.