An NHPP Software Reliability Model with S-Shaped Growth Curve Subject to Random Operating Environments and Optimal Release Time

: The failure of a computer system because of a software failure can lead to tremendous losses to society; therefore, software reliability is a critical issue in software development. As software has become more prevalent, software reliability has also become a major concern in software development. We need to predict the ﬂuctuations in software reliability and reduce the cost of software testing: therefore, a software development process that considers the release time, cost, reliability, and risk is indispensable. We thus need to develop a model to accurately predict the defects in new software products. In this paper, we propose a new non-homogeneous Poisson process (NHPP) software reliability model, with S-shaped growth curve for use during the software development process, and relate it to a fault detection rate function when considering random operating environments. An explicit mean value function solution for the proposed model is presented. Examples are provided to illustrate the goodness-of-ﬁt of the proposed model, along with several existing NHPP models that are based on two sets of failure data collected from software applications. The results show that the proposed model ﬁts the data more closely than other existing NHPP models to a signiﬁcant extent. Finally, we propose a model to determine optimal release policies, in which the total software system cost is minimized depending on the given environment.


Introduction
'Software' is a generic term for a computer program and its associated documents.Software is divided into operating systems and application software.As new hardware is developed, the price decreases; thus, hardware is frequently upgraded at low cost, and software becomes the primary cost driver.The failure of a computer system because of a software failure can cause significant losses to society.Therefore, software reliability is a critical issue in software development.This problem requires finding a balance between meeting user requirements and minimizing the testing costs.It is necessary to know in the planning cycle the fluctuation of software reliability and the cost of testing, in order to reduce costs during the software testing stage, thus a software development process that considers the release time, cost, reliability, and risk is indispensable.In addition, it is necessary to develop a model to predict the defects in software products.To estimate reliability metrics, such as the number of residual faults, the failure rate, and the overall reliability of the software, various non-homogeneous Poisson process (NHPP) software reliability models have been developed using a fault intensity rate function and mean value function within a controlled testing environment.The purpose of many NHPP software reliability models is to obtain an explicit formula for the mean value function, m(t), which is applied to the software testing data to make predictions on software failures and reliability in field environments [1].A few researchers have evaluated a generalized software reliability model that captures the uncertainty of an environment and its effects on the software failure rate, and have developed a NHPP software reliability model when considering the uncertainty of the system fault detection rate per unit of time subject to the operating environment [2][3][4].Inoue et al. [5] developed a bivariate software reliability growth model that considers the uncertainty of the change in the software failure-occurrence phenomenon at the change-point for improved accuracy.Okamura and Dohi [6] introduced a phase-type software reliability model and developed parameter estimation algorithms using grouped data.Song et al. [7,8] recently developed an NHPP software reliability model to consider a three-parameter fault detection rate, and applied a Weibull fault detection rate function during the software development process.They related the model to the error detection rate function by considering the uncertainty of the operating environment.In addition, Li and Pham [9] proposed a model accounting for the uncertainty of the operating environment under the condition that the fault content function is a linear function of the testing time, and that the fault detection rate is based on the testing coverage.
In this paper, we discuss a new NHPP software reliability model with S-shaped growth curve applicable to the software development process and relate it to the fault detection rate function when considering random operating environments.We examine the goodness-of-fit of the proposed model and other existing NHPP models that are based on several sets of software failure data, and then determine the optimal release times that minimize the expected total software cost under given conditions.The explicit solution of the mean value function for the new NHPP software reliability model is derived in Section 2. Criteria for the model comparisons and the selection of the best model are discussed in Section 3. The optimal release policy is discussed in Section 4, and the results of a model analysis and the optimal release times are discussed in Section 5. Finally, Section 6 provides some concluding remarks.

Non-Homogeneous Poisson Process
The software fault detection process has been formulated using a popular counting process.The counting process {N(t), t ≥ 0} is a non-homogeneous Poisson process (NHPP) with an intensity function λ(t), if it satisfies the following condition.
(I) N(0) = 0 (II) Independent increments (III) t 2 t 1 λ(t)dt, (t 2 ≥ t 1 ): the average of the number of failures in the interval [t 1 , t 2 ] Assuming that the software failure/defect conforms to the NHPP condition, N(t)(t ≥ 0) represents the cumulative number of failures up to the point of execution, and m(t) is the mean value function.The mean value function m(t) and the intensity function λ(t) satisfy the following relationship.
N(t) is a Poisson distribution involving the mean value function, m(t), and can be expressed as:

General NHPP Software Reliability Model
Pham et al. [10] formalized the general framework for NHPP-based software reliability and provided analytical expressions for the mean value function m(t) using differential equations.
The mean value function m(t) of the general NHPP software reliability model with different values for a(t) and b(t), which reflects various assumptions of the software testing process, can be obtained with the initial condition N(0 The general solution of (1) is where B(t) = t t 0 b(s)ds, and m(t 0 ) = m 0 is the marginal condition of (2).

New NHPP Software Reliability Model
Pham [3] formulated a generalized NHPP software reliability model that incorporated uncertainty in the operating environment as follows: where η is a random variable that represents the uncertainty of the system fault detection rate in the operating environment with a probability density function g; b(t) is the fault detection rate function, which also represents the average failure rate caused by faults; N is the expected number of faults that exists in the software before testing; and, m(t) is the expected number of errors detected by time t (the mean value function).Thus, a generalized mean value function, m(t), where the initial condition m(0) = 0, is given by The mean value function [11] from (4) using the random variable η has a generalized probability density function g with two parameters α ≥ 0 and β ≥ 0 and is given by where b(t) is the fault detection rate per fault per unit of time.
We propose an NHPP software reliability model including the random operating environment using Equations ( 3)-( 5) and the following assumptions [7,8]:

(c)
The software failure detection rate at any time depends on both the fault detection rate and the number of remaining faults in the software at that time.(d) Debugging is performed to remove faults immediately when a software failure occurs.

(e)
New faults may be introduced into the software system, regardless of whether other faults are removed or not.(f) The fault detection rate b(t) can be expressed by (6).

(g)
The random operating environment is captured if unit failure detection rate b(t) is multiplied by a factor η that represents the uncertainty of the system fault detection rate in the field In this paper, we consider the fault detection rate function b(t) to be as follows: We obtain a new NHPP software reliability model with S-shaped growth curve subject to random operating environments, m(t), that can be used to determine the expected number of software failures detected by time t by substituting function b(t) above into (5) so that:

Criteria for Model Comparisons
Theoretically, once the analytical expression for mean value function m(t) is derived, then the parameters in m(t) can be estimated using parameter estimation methods (MLE: the maximum likelihood estimation method, LSE: the least square estimation method); however, in practice, accurate estimates may not be obtained by the MLE, particularly under certain conditions where the mean value function m(t) is too complex.The model parameters to be estimated in the mean value function m(t) can then be obtained using a MATLAB program that is based on the LSE method.Six common criteria; the mean squared error (MSE), Akaike's information criterion (AIC), the predictive ratio risk (PRR), the predictive power (PP), the sum of absolute errors (SAE), and R-square (R 2 ) will be used for the goodness-of-fit estimation of the proposed model, and to compare the proposed model with other existing models, as listed in Table 1.These criteria are described as follows.
The MSE is AIC [12] is The PRR [13] is The PP [13] is The SAE [8] is The correlation index of the regression curve equation (R 2 ) [9] is Here, m(t i ) is the estimated cumulative number of failures at t i for i = 1, 2, • • • , n; y i is the total number of failures observed at time t i ; n is the actual data which includes the total number of observations; and, m is the number of unknown parameters in the model.
The MSE measures the distance of a model estimate from the actual data that includes the total number of observations and the number of unknown parameters in the model.AIC is measured to compare the capability of each model in terms of maximizing the likelihood function (L), while considering the degrees of freedom.The PRR measures the distance of the model estimates from the actual data against the model estimate.The PP measures the distance of the model estimates from the actual data.The SAE measures the absolute distance of the model.For five of these criteria, i.e., MSE, AIC, PRR, PP, and SAE, the smaller the value is, the closer the model fits relative to other models run on the same dataset.On the other hand, R 2 should be close to 1.
We use (8) below to obtain the confidence interval [13] of the proposed NHPP software reliability model.The confidence interval is described as follows; m(t) ± Z α/2 m(t), (16) where, Z α/2 is 100(1 − α), the percentile of the standard normal distribution.Table 1 summarizes the different mean value functions of the proposed new model and several existing NHPP models.Note that models 9 and 10 consider environmental uncertainty.

Optimal Software Release Policy
In this section, we next discuss the use of the software reliability model under varying situations to determine the optimal software release time, and to determine the optimal software release time, T*, which minimizes the expected total software cost.Many studies have been conducted on the optimal software release time and its related problems [20][21][22][23][24].The quality of the system will normally depend on the testing efforts, such as the testing environment, times, tools, and methodologies.If testing is short, the cost of the system testing is lower, but the consumers may face a higher risk e.g., buying an unreliable system.This also involves the higher costs of the operating environment because it is much more expensive to detect and correct a failure during the operational phase than during the testing phase.In contrast, the longer the testing time, the more faults that can be removed, which leads to a more reliable system; however, the testing costs for the system will also increase.Therefore, it is very important to determine when to release the system based on test cost and reliability.Figure 1 shows the system development lifecycle considered in the following cost model: the testing phase before release time T, the testing environment period, the warranty period, and the operational life in the actual field environment, which is usually quite different from the testing environment [24].

Optimal Software Release Policy
In this section, we next discuss the use of the software reliability model under varying situations to determine the optimal software release time, and to determine the optimal software release time, T*, which minimizes the expected total software cost.Many studies have been conducted on the optimal software release time and its related problems [20][21][22][23][24].The quality of the system will normally depend on the testing efforts, such as the testing environment, times, tools, and methodologies.If testing is short, the cost of the system testing is lower, but the consumers may face a higher risk e.g., buying an unreliable system.This also involves the higher costs of the operating environment because it is much more expensive to detect and correct a failure during the operational phase than during the testing phase.In contrast, the longer the testing time, the more faults that can be removed, which leads to a more reliable system; however, the testing costs for the system will also increase.Therefore, it is very important to determine when to release the system based on test cost and reliability.Figure 1 shows the system development lifecycle considered in the following cost model: the testing phase before release time T, the testing environment period, the warranty period, and the operational life in the actual field environment, which is usually quite different from the testing environment [24].The expected total software cost C(T) [24] can be expressed as where, C 0 is the set-up cost of testing, C 1 T is the cost of testing, C 2 m(T)µ y is the expected cost to remove all errors detected by time T during the testing phase, C 3 (1 − R(x|T)) is the penalty cost owing to failures that occurs after the system release time T, and C 4 [m(T + T w ) − m(T)]µ w is the expected cost to remove all of the errors that are detected during the warranty period [T, T + T w ].The cost that is required to remove faults during the operating period is higher than during the testing period, and the time that is needed is much longer.Finally, we aim to find the optimal software release time, T*, with the expected minimum in the environment as follows: Minimize C(T).( 18)

Data Information
Dataset #1 (DS1), presented in Table 2, was reported by Musa [25] based on software failure data from a real time command and control system (RTC&CS), and represents the failures that were observed during system testing (25 hours of CPU time).The number of test object instructions delivered for this system, which was developed by Bell Laboratories, was 21,700.Dataset #2 (DS2), as shown in Table 3, is the second of three releases of software failure data collected from three different releases of a large medical record system (MRS) [26], consisting of 188 software components.Each component contains several files.Initially, the software consisted of 173 software components.All three releases added new functionality to the product.Between three and seven new components were added in each of the three releases, for a total of 15 new components.Many other components were modified during each of the three releases as a side effect of the added functionality.Detailed information of the dataset can be obtained in the report by Stringfellow and Andrews [26].
Dataset #3 (DS3), as shown in Table 4, is from one of four major releases of software products at Tandom Computers (TDC) [27].There are 100 failures that are observed within testing CPU hours.Detailed information of the dataset can be obtained tin the report by Wood [27].

Model Analysis
Tables 5-7 summarize the results of the estimated parameters of all 10 models in Table 1 using the LSE technique and the values of the six common criteria: MSE, AIC, PRR, PP, SAE, and R 2 .We obtained the six common criteria at t = 1, 2, • • • , 25 from DS1 (Table 2), at t = 1, 2, • • • , 17 from DS2 (Table 3), and at cumulative testing CPU hours from DS3 (Table 4).As can be seen in Table 5, when comparing all of the models, the MSE and AIC values are the lowest for the newly proposed model, and the PRR, PP, SAE, and R 2 values are the second best.The MSE and AIC values of the newly proposed model are 7.361, 114.982, respectively, which are significantly less than the values of the other models.In Table 6, when comparing all of the models, all criteria values for the newly proposed model are best.The MSE value of the newly proposed model is 60.623, which is significantly lower than the value of the other models.The AIC, PRR, PP, and SAE values of the newly proposed model are 151.156,0.043, 0.041, and 98.705, respectively, which are also significantly lower than the other models.The value of R 2 is 0.960 and is the closest to 1 for all of the models.In Table 7, when comparing all of the models, all the criteria values for the newly proposed model are best.The MSE value of the newly proposed model is 6.336, which is significantly lower than the value of the other models.The PRR, PP, and SAE values of the newly proposed model are 0.086, 0.066, and 36.250,respectively, which are also significantly lower than the other models.The value of R 2 is 0.9940 and is the closest to 1 for all of the models.4 show the graphs of the mean value functions for all 10 models for DS1, DS2, and DS3, respectively.Figures 5-7 show the graphs of the 95% confidence limits of the newly proposed model for DS1, DS2, and DS3.Tables A1-A3 in Appendix A list the 95% confidence intervals of all 10 NHPP software reliability models for DS1, DS2, and DS3.In addition, the relative error value of the proposed software reliability model confirms its ability to provide more accurate predictions as it remains closer to zero when compared to the other models (Figures 8-10).and 4 show the graphs of the mean value functions for all 10 models for DS1, DS2, and DS3, respectively.Figures 5, 6, and 7 show the graphs of the 95% confidence limits of the newly proposed model for DS1, DS2, and DS3.Tables A1, A2, and A3 in Appendix A list the 95% confidence intervals of all 10 NHPP software reliability models for DS1, DS2, and DS3.In addition, the relative error value of the proposed software reliability model confirms its ability to provide more accurate predictions as it remains closer to zero when compared to the other models (Figures 8,  9, and 10).

Optimal Software Release Time
Factor η captures the effects of the field environmental factors based on the system failure rate as described in Section 2. System testing is commonly carried out in a controlled environment, where we can use a constant factor η equal to 1.The newly proposed model becomes a delayed S-shaped model when η = 1 in (7).Thus, we apply different mean value functions m(t) to the cost model C(T) of ( 8) when considering the three conditions described below.We apply the cost model to these three conditions using DS1 (Table 2).Using the LSE method, the parameters of the delayed S-shaped model and the newly proposed model are obtained, as described in Section 5.2.
(1) The expected total software cost with controlled environmental factor (η = 1) is where m(T) = a (1 (2) The expected total software cost with a random operating environmental factor (η = f(x)) is where (3) The expected total software cost between the testing environment (η = 1) and field environment where We consider the following coefficients in the cost model for the baseline case: The results of the baseline case are listed in Table 8, and the expected total cost for the three conditions above is 1338.70,2398.24, and 2263.33,respectively.For the second condition, the expected total cost and the optimal release time are high.The expected total cost is the lowest for the first condition, and the optimal release time is shortest for the third condition.To study the impact of different coefficients on the expected total cost and the optimal release time, we vary some of the coefficients and then compare them with the baseline case.First, we evaluate the impact of the warranty period on the expected total cost by changing the value of the corresponding warranty time and comparing the optimal release times for each condition.Here, we change the values of T w from 10 h to 2, 5, and 15 h, and the values of the other parameters remain unchanged.Regardless of the warranty period, the optimal release time for the third condition is the shortest, and the expected total cost for the first condition is the lowest overall.Figure 11 shows the graph of the expected total cost for the baseline case.Figures 12-14 show the graphs of the expected total cost subject to the warranty period for the three conditions.unchanged, because different values of C 0 and C 1 will certainly increase the expected total cost.When we change the values of C 2 from 50 to 25 and 100, the optimal release time is only changed significantly for the second condition.As can be seen from Table 9, the optimal release time T* is 37.5 when the value of C 2 is 25, and 29.1 when the value of C 2 is 100.When we change the value of C 3 from 2000 to 500 and 4000, the optimal release time is only changed significantly for the first condition.As Table 10 shows, the optimal release time T* is 16.5 when the value of C 3 is 500, and 14.6 when the value of C 3 is 4000.When we change the value of C 4 from 400 to 200 and 1000, the optimal release time is changed for all of the conditions.As can be seen from Table 11, the optimal release time T* is 14.3 for the first condition when the value of C 4 is 200, and 16.3 when the value of C 4 is 1000.In addition, the optimal release time T* is 20.0 for the second condition when the value of C 4 is 200, and 61.0 when the value of C 4 is 1000.The optimal release time T* is 11.6 for the third condition when the value of C 4 is 200, and 12.8 when the value of C 4 is 1000.Thus, the second condition has a much greater variation in optimal release time than the other conditions.As a result, we can confirm that the cost model of the first condition does not reflect the influence of the operating environment, and that the cost model of the second condition does not reflect the influence of the test environment.Figure 15 shows the graph of the expected total cost according to the cost coefficient C 2 in the 2nd condition.Figures 16-18 show the graphs of the expected total cost according to cost coefficient C 4 in the three conditions.

Conclusions
Existing well-known NHPP software reliability models have been developed in a test environment.However, a testing environment differs from an actual operating environment, so we considered random operating environments.In this paper, we discussed a new NHPP software reliability model, with S-shaped growth curve that accounts for the randomness of an actual operating environment.Tables 5-7 summarize the results of the estimated parameters of all ten models that are applied using the LSE technique and six common criteria (MSE, AIC, PRR, PP, SAE, and R 2 ) for the DS1, DS2, and DS3 datasets.As can be seen from Tables 5-7, the newly proposed model displays a better overall fit than all of the other models when compared, particularly in the case of DS2.In addition, we provided optimal release policies for various environments to determine when the total software system cost is minimized.Using a cost model for a given environment is beneficial as it provides a means for determining when to stop the software testing process.In this paper, faults are assumed to be removed immediately when a software failure has been detected, and the correction process is assumed to not introduce new faults.Obviously, further work in revisiting these assumptions is worth the effort as our future study.We hope to present some new results on this aspect in the near future.
(a) The occurrence of a software failure follows a non-homogeneous Poisson process.(b) Faults during execution can cause software failure.

Figure 1 .
Figure 1.System cost model infrastructure.Figure 1. System cost model infrastructure.

Figure 1 .
Figure 1.System cost model infrastructure.Figure 1. System cost model infrastructure.

Figure 2 .
Figure 2. Mean value function of the ten models for DS1.

Figure 2 .
Figure 2. Mean value function of the ten models for DS1.

Figure 3 .
Figure 3. Mean value function of the ten models for DS2.

Figure 4 .
Figure 4. Mean value function of the ten models for DS3.

Figure 3 .
Figure 3. Mean value function of the ten models for DS2.

Figure 3 .
Figure 3. Mean value function of the ten models for DS2.

Figure 4 .
Figure 4. Mean value function of the ten models for DS3.

Figure 4 .
Figure 4. Mean value function of the ten models for DS3.

Figure 5 .
Figure 5. 95% confidence limits of the newly proposed model for DS1.

Figure 6 .
Figure 6.95% confidence limits of the newly proposed model for DS2.

Figure 5 .
Figure 5. 95% confidence limits of the newly proposed model for DS1.

Figure 5 .
Figure 5. 95% confidence limits of the newly proposed model for DS1.

Figure 6 .
Figure 6.95% confidence limits of the newly proposed model for DS2.

Figure 6 .
Figure 6.95% confidence limits of the newly proposed model for DS2.

Figure 8 .
Figure 8. Relative error of the ten models for DS1.

Figure 9 .
Figure 9. Relative error of the ten models for DS2.

Figure 10 .
Figure 10.Relative error of the ten models for DS3.

Figure 8 .
Figure 8. Relative error of the ten models for DS1.

Figure 9 .
Figure 9. Relative error of the ten models for DS2.

Figure 10 .
Figure 10.Relative error of the ten models for DS3.

Figure 8 . 23 Figure 7 .
Figure 8. Relative error of the ten models for DS1.

Figure 8 .
Figure 8. Relative error of the ten models for DS1.

Figure 9 .
Figure 9. Relative error of the ten models for DS2.

Figure 10 .
Figure 10.Relative error of the ten models for DS3.

Figure 9 . 23 Figure 7 .
Figure 9. Relative error of the ten models for DS2.

Figure 8 .
Figure 8. Relative error of the ten models for DS1.

Figure 9 .
Figure 9. Relative error of the ten models for DS2.

Figure 10 .
Figure 10.Relative error of the ten models for DS3.

Figure 10 .
Figure 10.Relative error of the ten models for DS3.

23 Figure 11 .
Figure 11.Expected total cost for the baseline case.

Figure 12 .
Figure 12.Expected total cost subject to the warranty period for the 1 st condition.

Figure 13 .
Figure 13.Expected total cost subject to the warranty period for the 2 nd condition.

Figure 11 . 23 Figure 11 .
Figure 11.Expected total cost for the baseline case.

Figure 12 .
Figure 12.Expected total cost subject to the warranty period for the 1 st condition.

Figure 13 .
Figure 13.Expected total cost subject to the warranty period for the 2 nd condition.

Figure 12 . 23 Figure 11 .
Figure 12.Expected total cost subject to the warranty period for the 1st condition.

Figure 12 .
Figure 12.Expected total cost subject to the warranty period for the 1 st condition.

Figure 13 .
Figure 13.Expected total cost subject to the warranty period for the 2 nd condition.Figure13.Expected total cost subject to the warranty period for the 2nd condition.

Figure 13 .
Figure 13.Expected total cost subject to the warranty period for the 2 nd condition.Figure13.Expected total cost subject to the warranty period for the 2nd condition.

Figure 12 .
Figure 12.Expected total cost subject to the warranty period for the 1 st condition.

Figure 13 .
Figure 13.Expected total cost subject to the warranty period for the 2 nd condition.

Figure 14 .
Figure 14.Expected total cost subject to the warranty period for the 3 rd condition.Figure14.Expected total cost subject to the warranty period for the 3rd condition.

Figure 14 .
Figure 14.Expected total cost subject to the warranty period for the 3 rd condition.Figure14.Expected total cost subject to the warranty period for the 3rd condition.

Figure 15 .
Figure 15.Expected total cost according to cost coefficient C2 for the 2 nd condition.

Figure 16 .
Figure 16.Expected total cost according to cost coefficient C4 for the 1 st condition.

Figure 17 .
Figure 17.Expected total cost according to cost coefficient C4 for the 2 nd condition.

Figure 18 .
Figure 18.Expected total cost according to cost coefficient C4 for the 3 rd condition.

Figure 15 . 23 Figure 15 .
Figure 15.Expected total cost according to cost coefficient C 2 for the 2nd condition.

Figure 16 .
Figure 16.Expected total cost according to cost coefficient C4 for the 1 st condition.

Figure 17 .
Figure 17.Expected total cost according to cost coefficient C4 for the 2 nd condition.

Figure 18 .
Figure 18.Expected total cost according to cost coefficient C4 for the 3 rd condition.

Figure 16 . 23 Figure 15 .
Figure 16.Expected total cost according to cost coefficient C 4 for the 1st condition.

Figure 16 .
Figure 16.Expected total cost according to cost coefficient C4 for the 1 st condition.

Figure 17 .
Figure 17.Expected total cost according to cost coefficient C4 for the 2 nd condition.

Figure 18 .
Figure 18.Expected total cost according to cost coefficient C4 for the 3 rd condition.

Figure 17 . 23 Figure 15 .
Figure 17.Expected total cost according to cost coefficient C 4 for the 2nd condition.

Figure 16 .
Figure 16.Expected total cost according to cost coefficient C4 for the 1 st condition.

Figure 17 .
Figure 17.Expected total cost according to cost coefficient C4 for the 2 nd condition.

Figure 18 .
Figure 18.Expected total cost according to cost coefficient C4 for the 3 rd condition.

Figure 18 .
Figure 18.Expected total cost according to cost coefficient C 4 for the 3rd condition.

Table 6 .
Model parameter estimation and comparison criteria from MRS data set (DS2).

Table 7 .
Model parameter estimation and comparison criteria from MRS data set (DS3).

Table 8 .
Optimal release time T* subject to the warranty period.

Table 9 .
Optimal release time T* according to cost coefficient C 2.

Table 10 .
Optimal release time T* according to cost coefficient C 3.

Table 11 .
Optimal release time T* according to cost coefficient C 4.