Study of a New Software Reliability Growth Model under Uncertain Operating Environments and Dependent Failures

: The coronavirus disease (COVID-19) outbreak has prompted various industries to embark on digital transformation efforts, with software playing a critical role. Ensuring the reliability of software is of the utmost importance given its widespread use across multiple industries. For example, software has extensive applications in areas such as transportation, aviation, and military systems, where reliability problems can result in personal injuries and signiﬁcant ﬁnancial losses. Numerous studies have focused on software reliability. In particular, the software reliability growth model has served as a prominent tool for measuring software reliability. Previous studies have often assumed that the testing environment is representative of the operating environment and that software failures occur independently. However, the testing and operating environments can differ, and software failures can sometimes occur dependently. In this study, we propose a new model that assumes uncertain operating environments and dependent failures. In other words, the model proposed in this study takes into account a wider range of environments. The numerical examples in this study demonstrate that the goodness of ﬁt of the new model is signiﬁcantly better than that of the existing SRGM. Additionally, we show the utilization of the sequential probability ratio test (SPRT) based on the new model to assess the reliability of the dataset.


Introduction
A software reliability growth model (SRGM) is employed to assess the reliability and quality of software products. This enables consumers to evaluate products by referring to reliability information, and developers can efficiently manage development plans based on reliability considerations. For instance, using the mean value function m(t), it is possible to predict the number of failures at a future time point t. Additionally, it can be used to establish policies for determining the optimal release timing for selling products. In other words, the SRGM is used as a tool for predicting the number of failures in future time periods, predicting product reliability, determining release policies, and estimating development costs. A software reliability growth model is represented by a mean value function m(t) that exhibits unique characteristics. The form of m(t) varies depending on the assumed environments (such as the development, testing, and operating phases). Although there have been numerous studies on software reliability models, it is generally observed that software defects and failures do not occur at regular time intervals. In response to this, existing SRGMs predominantly adopt a nonhomogeneous Poisson process (NHPP) framework. NHPP SRGMs provide a mathematical framework for handling software reliability and are widely utilized due to their versatility in various applications. Previous either uncertain operating environments or only considers fault dependency. However, in this study, we develop a model that takes both assumptions into account. Subsequently, we evaluate the performance of each model using real datasets. Numerical examples demonstrate that the proposed model outperforms other models that solely account for uncertain operating environments or dependent failures. This provides a more accurate prediction of the number of failures. Additionally, we demonstrate the effectiveness of the SPRT by utilizing optimal assumption cases based on our proposed model. This allows testers to determine when to stop testing based on the software reliability.
In Section 2, we provide the basic background of NHPP SRGMs and introduce the existing NHPP SRGM models as well as the model proposed in this paper. The SPRT procedure is outlined in Section 3. Section 4 presents the datasets and criteria used in this numerical study. We compare the fit of each model to the datasets and apply the SPRT. In Section 5, we discuss the results of the numerical example. Finally, Section 6 presents the conclusions of this study.
It characterizes the cumulative number of failures, denoted as N(t)(t ≥ 0), up to a given execution time t. The mean value function m(t) represents the expected cumulative number of failures at time t. The function m(t) can be obtained by integrating the intensity function λ(t) from 0 to t as follows: The reliability function based on the NHPP can be expressed as follows using m(t) [29]. The reliability function R(t) is defined as the probability that there are no failures in the time interval (0, t), given by R(t) = P{N(t) = 0} = e −m(t) . ( Equation (3) means the probability that a software error does not occur in the interval (0, t). If t + x is given, then the software reliability can be expressed as a conditional probability R(x|t) as in Equation (4).
Here, R(x|t) is the probability that a software error does not occur in the interval (t, t + x), where t ≥ 0, x > 0. The density function of x is given by where λ(x) = ∂ ∂x [m(x)].

Existing SRGMs
The mean value function m(t) of the NHPP SRGM is obtained by solving a differential equation. The form of the mean value function depends on the assumptions and specific environments being studied. The commonly used differential equation is as follows [30]: where the function a(t) and represents the expected number of initial failures and newly introduced errors until the start point of testing period and b(t) represents the failure detection rate per fault. This paper presents a specific software reliability model with consideration of the uncertainty of the operating environment based on the work by Pham [26].
The NHPP SRGM is typically characterized by a differential equation that is widely recognized in the field. To account for the uncertain operating conditions considered in this study, the mean value function of the proposed model is obtained as follows [31]: where η is a random variable. In order to explain the uncertain operating environment, this equation utilizes the random variable η, which has a Γ(α, β).

Proposed Model
Most existing NHPP SRGMs assume that testing and operating environments are the same and that failures occur independently. However, sometimes software failures occur dependently, and the operating environments may differ from the testing environments. For example, if an error occurs in a particular class within a program code, then it may cause errors in other classes that refer to the affected class. Conflicts between program codes due to background processes can potentially impact other codes. These situations lead to the occurrence of dependent failures. Furthermore, constructing a testing environments that consider all operating environments is difficult for testers. The operating environment means the environment in which consumers use the software, including hardware specifications (CPU, GPU, RAM, etc.), operating systems (Window, Mac, Linux, etc.), and various concurrently running programs in the background. The proposed model considers dependent failures and uncertain operating environments. Quantifying the operating environments numerically is difficult. So, looking at Equation (7), the uncertain operating environments are represented by η, the random variable. The assumption of dependent failure occurrences is represented by the parameters of the gamma distribution followed by η, which will be discussed in detail when explaining Equation (10).
In this paper, we propose a model that incorporates both uncertain operating environments and dependent failures. The inclusion of the latter is motivated by the need to consider situations in which failures can propagate from one component to another. With the functions a(t) = N and b(t) = c/(1 + αexp(−bt)), we can obtain the mean value function m(t) from Equation (7), as shown below: The proposed model has five parameters, namely b, c, α, β, and N. In Equation (10), the parameters α and β are also the parameters of g(η) = β α η α−1 e −βη

Γ(α)
, which is the gamma distribution, where η represents the uncertain operating environments in Equation (7).
The assumption of dependent failures in the proposed model arises from the interdependence of model parameters. Specifically, the values of α and β in Equation (10) depend on the probability distribution of η, which characterizes the uncertain operating environments. As these parameters appear in Equation (7), which expresses the failure detection rate, the assumption of dependent failures is a natural consequence of the model design. Therefore, the correlation between model parameters is a crucial factor that underlies the assumption of dependent failures. Table 1 lists the mean value functions for the existing NHPP SRGMs and the proposed model. Each model is referred to by abbreviations of its characteristics or author names. DPF1 and DPF2 assume dependent failures, whereas the others assume independent failures. VTUB assumes uncertain operating environments, whereas the proposed model (NEW) assumes dependent failures and uncertain operating environments.

Sequential Probability Ratio Test
Wald's SPRT is widely used as a hypothesis-testing technique [27]. It tests the probability ratios of two hypotheses, p 0 and p 1 , against a predetermined threshold value at each time point. The SPRT algorithm is iterative and requires additional data collection and testing if the probability ratio falls within a certain acceptance region (A and B). Equation (11) expresses the relationship between p 0 and p 1 and the thresholds A and B.
where A and B are constants used to determine the acceptance and rejection of the null hypothesis H 0 . If p 1 /p 0 ≥ A, then H 0 is rejected. If p 1 /p 0 ≤ B, then H 0 is accepted.
Moreover, A and B depend on α and β, as shown in Equations (11) and (12). Here, α and β are type 1 and type 2 errors, respectively. In other words, α is the producer's risk, and β is the consumer's risk.
The values of A and B depend on the prespecified risk probabilities, α and β, which represent type 1 (producer's risk) and type 2 (consumer's risk) errors and are typically set to 0.05 or 0.1, respectively. The upper line that determines rejection is called N U (t), and the lower line that determines acceptance is called N L (t) are represented as follows: where a, b 1 , and b 2 are given as follows Figure 1 shows the reliable region of SPRT. If the data value (blue dot) at a certain time point exists within the reliable region, then it is labeled as "Continue". If the value is outside the region, then a conclusion of "Reject" or "Accept" is made.
where A and B are constants used to determine the acceptance and rejection of the null hypothesis H 0 . If p 1 /p 0 ≥ A, then H 0 is rejected. If p 1 /p 0 ≤ B, then H 0 is accepted.
Moreover, A and B depend on α and β, as shown in Equations (11) and (12). Here, α and β are type 1 and type 2 errors, respectively. In other words, α is the producer's risk, and β is the consumer's risk.
The values of A and B depend on the prespecified risk probabilities, α and β, which represent type 1 (producer's risk) and type 2 (consumer's risk) errors and are typically set to 0.05 or 0.1, respectively. The upper line that determines rejection is called N U (t), and the lower line that determines acceptance is called N L (t) are represented as follows: where a, b 1 , and b 2 are given as follows Figure 1 shows the reliable region of SPRT. If the data value (blue dot) at a certain time point exists within the reliable region, then it is labeled as "Continue". If the value is outside the region, then a conclusion of "Reject" or "Accept" is made.  Stieber [28] applied the SPRT to estimate the reliability of NHPP SRGMs, which involved redefining the probability ratios p 0 and p 1 of Equation (11) in terms of the mean value function m(t). Here, p 0 and p 1 are expressed as follows: The constant B of Equation (11) is on the left side of Equation (17), whereas the constant A of Equation (11) is on the right side of Equation (17). In addition, N(t) in Equation (17) is the probability ratio p 1 /p 0 .

Numerical Example
In this section, we fit the proposed model and existing models with actual data to estimate the criteria and compare their goodness of fit. We apply the sequential probability ratio test (SPRT) to evaluate the reliability of the data set. Firstly, we fit the data set to each model (mean value function) and estimate the parameters of each model using the least-squares estimation (LSE) method. Then, we calculate the criteria using the estimated parameter values (m(t)) and compare the goodness of fit. Lastly, we construct an equidistant scale for the parameter set of the proposed model and determine the threshold for the SPRT test based on this parameter set. Finally, we examine the results of applying SPRT to Dataset 1.

Datasets
We employed two datasets to compare the goodness of fit of the different models [29]. The first dataset ( Table 2) was collected by ABC Software Company. The project team comprised a unit manager, one user interface software engineer, and ten software engineers/testers. The dataset was observed over a period of 12 weeks (the unit of time in the table is weeks), and 55 failures were observed during this time. The second dataset (Table 3) was collected from a real-time command and control system developed by Bell Laboratories. The failure data corresponds to the observed failures during system testing, and 136 failures were recorded within a period of 25 h.

Criteria
Different criteria have been suggested for evaluating how well a model fits the data [9]. This study discusses 10 criteria (MSE, PRR, PP, SAE, R 2 , AIC, PRV, RMSPE, MAE, and MEOP) to compare the proposed model with 10 existing NHPP SRGMs. Table 4 presents various criteria used to evaluate the goodness of fit of different NHPP SRGMs and the proposed model. These criteria measure the distance or error between the predicted number of failures based on the mean value function of the model, denoted as m(t i ), and the actual observed data, denoted as y i . The number of data points is represented as n, and the number of parameters in the model is represented as m. The shorter the distance between the predicted and actual values, the better the mean value function of the model is at predicting the number of failures in the dataset.  Table 4. Criteria.

No. Criteria
Akaike's information criterion (AIC) [40] −2log L + 2m Root-mean-square prediction error (RMSPE) [41] Bias 2 + PRV 2 9 Mean absolute error (MAE) [42] The criteria used in the evaluation include the following: The MSE considers the number of parameters in the model, and the number of data points used to measure the distance between the predicted and actual values. The PRR measures the distance between the predicted and actual values while considering the value predicted by the model. The PP measures the distance between the predicted and actual values while considering only the actual data. The SAE measures the total distance between the predicted and actual values.
The coefficient of determination (R 2 ) is a measure of the regression fit. It represents the proportion of the regression sum of squares to the total sum of squares in the model. The closer the value is to 1, the better the fit of the model.
The AIC is a statistical measure that evaluates the ability of a model to fit the data. The likelihood function (L) of the model is maximized, and the AIC is adjusted for the number of parameters in the model. Typically, a model with more parameters has a better fit; however, the AIC prevents overfitting by penalizing models with too many parameters.
Specifically, the AIC is calculated as the log-likelihood function (logL) plus a penalty term that depends on the number of parameters in the model. The likelihood function (L) and the log-likelihood function (logL) are defined as follows: The PRV, also known as variation or variance, calculates the standard deviation of the prediction bias; a smaller value indicates a better model fit. The bias is defined as

Results of Goodness of Fit
Tables 5 and 6 present the estimated parameters of the models, which are obtained through the application of least-squares estimation. The results indicate that the proposed model performs better than the other models in predicting the cumulative number of failures in the datasets. The MSE, PRR, and PP are particularly commonly used criteria. Figures 2 and 3 show the top three models for the criteria (MSE, PRR, and PP) in Tables 7 and 8. In Figure 2, the goodness of fit of the proposed model for Dataset 1 is better than that of the DPF1 and DPF2 models, which assume only dependent failures. Similarly, Figure 3 shows that the goodness of fit of the proposed model for Dataset 2 is better than that of the VTUB model, which assumes only uncertain operating environments. Thus, the proposed model, which considers both dependent failures and uncertain operating environments, is a reasonable approach for studying software reliability. The MSE, PRR, and PP are particularly commonly used criteria. Figures 2 and 3 show the top three models for the criteria (MSE, PRR, and PP) in Tables 7 and 8. In Figure 2, the goodness of fit of the proposed model for Dataset 1 is better than that of the DPF1 and DPF2 models, which assume only dependent failures. Similarly, Figure 3 shows that the goodness of fit of the proposed model for Dataset 2 is better than that of the VTUB model, which assumes only uncertain operating environments. Thus, the proposed model, which considers both dependent failures and uncertain operating environments, is a reasonable approach for studying software reliability.

Results of SPRT
As the model proposed herein is the best fit for the datasets, we propose a method for measuring reliability by applying the SPRT based on the proposed model. Dataset 1 is used in this study. To test reliability, the SPRT is used on individual parameters or a set  The MSE, PRR, and PP are particularly commonly used criteria. Figures 2 and 3 show the top three models for the criteria (MSE, PRR, and PP) in Tables 7 and 8. In Figure 2, the goodness of fit of the proposed model for Dataset 1 is better than that of the DPF1 and DPF2 models, which assume only dependent failures. Similarly, Figure 3 shows that the goodness of fit of the proposed model for Dataset 2 is better than that of the VTUB model, which assumes only uncertain operating environments. Thus, the proposed model, which considers both dependent failures and uncertain operating environments, is a reasonable approach for studying software reliability.

Results of SPRT
As the model proposed herein is the best fit for the datasets, we propose a method for measuring reliability by applying the SPRT based on the proposed model. Dataset 1 is used in this study. To test reliability, the SPRT is used on individual parameters or a set

Results of SPRT
As the model proposed herein is the best fit for the datasets, we propose a method for measuring reliability by applying the SPRT based on the proposed model. Dataset 1 is used in this study. To test reliability, the SPRT is used on individual parameters or a set of parameters. For the proposed model, applying the SPRT to the parameters α and β can lead to sensitivity issues and potentially skew the SPRT results. Therefore, the SPRT is applied specifically to parameters b, N, and c.
Equation (20) shows the null and alternative hypotheses, m 0 (t) and m 1 (t), created based on the interval scale of the parameter groups. The parameters (b 0 , α, β, N 0 , and c 0 ) represent m 0 (t), whereas m 1 (t) is represented by (b 1 , α, β, N 1 , and c 1 ). The values of b 0 and b 1 are calculated asb − δ andb + δ, respectively, where δ is set as the percentile of the parameter value. For instance, when δ is considered to be 1% of each parameter value, b 0 is computed asb − 0.01 ×b, and b 1 is calculated asb + 0.01 ×b. Similarly, percentile values are used to determine the interval scales (N 0 , N 1 , c 0 , and c 1 ) for N and c. The values m 0 (t) and m 1 (t) in Equation (20) are substituted into Equation (17) and based on whether N(t) satisfies Equation (17), the conclusion is reached as "Continue" if it is satisfied. If N(t) is smaller than the left term of Equation (17), then it is concluded as "Acceptance". If N(t) is bigger than the right term of Equation (17), it is concluded as "Rejection".
To compare the SPRT results, various cases of δ are considered in this study, and Table 9 presents 30 cases of δ values for b, N, and c.  Tables 10-15 show the SPRT results for Dataset 1. The SPRT results from case 1 to case 20 are "Continue." This indicates "after collecting data for the next time point, and test for the next time point." From case 21 to case 30, the results are "Reject" at t = 6, which indicates "stop data collection, and reject the reliability." If the result is "Accept", then this indicates "stop the data collection, and accept the reliability". As the value of δ increases, the area of acceptance and rejection increases, and as the value of δ decreases, the area of "Continue" increases. Therefore, determining an appropriate level of δ is important for the SPRT. From Section 2.1, we can estimate reliability function R(x|t) of Dataset 1, where x is given as 0.1. Figure 4 shows the results. In Figure 4, it can be observed that the reliability sharply decreases until just before time point 5. This is estimated to be due to the rapid increase in the number of failures in Dataset 1, as indicated in Table 2. The SPRT results (cases 21-30) concluded the rejection of product reliability at the 6th time point, which aligns with the substantial number of failures in Dataset 1 up to the 6th time point. Although Dataset 1 in this study is tested for 12 weeks, according to the results of SPRT, it can be concluded that testing should be discontinued at the 6th week and efforts should be made to improve reliability.

Discussion
Most NHPP SRGMs assume that the testing environment is the same as the o environment or that software failures occur independently. In this study, we p new NHPP SRGM that assumes uncertain operating environments and depend ures. The results of numerical examples demonstrate the superiority of the p model over the models that consider only uncertain operating environments (V only dependent failures (DPF 1 and DPF 2). Thus, the proposed model estimates t ber of failures better than the existing NHPP SRGMs. This study also demonstra we can estimate software reliability using the proposed model by applying the S the value of δ increases, the "Continue" region becomes narrower, and the "Ac ject" regions become wider. Therefore, it is important to choose an appropriate le and further research on this matter is needed. Wood [43] explained that SRGM used to predict the number of failures and provide software reliability to consum study illustrates that the proposed model can be utilized in real environments.

Conclusions
This study had two objectives. First, we proposed a model that considered pendent failures and uncertain operating environments. The results of the nume amples demonstrated that the proposed model exhibited a significantly better fit models that considered only dependent failures (DPF 1 and DPF 2) or uncertain o environments (VTUB).
Second, by leveraging the proposed model, we introduced a method for a software reliability through the application of the SPRT. Specifically, although th was actually tested for 12 weeks, according to the results of the SPRT, testing wa tinued at the 6th week, and it was concluded that measures should be taken to the reliability. From the dataset and the values of the reliability function, it was o that the number of failures in the observed dataset until a certain time point wa than the number of failures after that point. In other words, even with a limited this study achieved the goal of an early reliability assessment by applying the SP ther studies that link the SPRT with future software release policies will contribu ficient development planning processes.

Discussion
Most NHPP SRGMs assume that the testing environment is the same as the operating environment or that software failures occur independently. In this study, we propose a new NHPP SRGM that assumes uncertain operating environments and dependent failures. The results of numerical examples demonstrate the superiority of the proposed model over the models that consider only uncertain operating environments (VTUB) or only dependent failures (DPF 1 and DPF 2). Thus, the proposed model estimates the number of failures better than the existing NHPP SRGMs. This study also demonstrates how we can estimate software reliability using the proposed model by applying the SPRT. As the value of δ increases, the "Continue" region becomes narrower, and the "Accept/Reject" regions become wider. Therefore, it is important to choose an appropriate level of δ, and further research on this matter is needed. Wood [43] explained that SRGMs can be used to predict the number of failures and provide software reliability to consumer. This study illustrates that the proposed model can be utilized in real environments.

Conclusions
This study had two objectives. First, we proposed a model that considered both dependent failures and uncertain operating environments. The results of the numerical examples demonstrated that the proposed model exhibited a significantly better fit than the models that considered only dependent failures (DPF 1 and DPF 2) or uncertain operating environments (VTUB).
Second, by leveraging the proposed model, we introduced a method for assessing software reliability through the application of the SPRT. Specifically, although the dataset was actually tested for 12 weeks, according to the results of the SPRT, testing was discontinued at the 6th week, and it was concluded that measures should be taken to improve the reliability. From the dataset and the values of the reliability function, it was observed that the number of failures in the observed dataset until a certain time point was higher than the number of failures after that point. In other words, even with a limited dataset, this study achieved the goal of an early reliability assessment by applying the SPRT. Further studies that link the SPRT with future software release policies will contribute to efficient development planning processes.