A Software Reliability Model with Dependent Failure and Optimal Release Time

: In the past, because computer programs were restricted to perform only simple functions, the dependence on software was not large, resulting in relatively small losses after a failure. However, with the development of the software market, the dependence on software has increased considerably, and software failures can cause signiﬁcant social and economic losses. Software reliability studies were previously conducted under the assumption that software failures occur independently. How-ever, as software systems become more complex and extremely large, software failures are becoming frequently interdependent. Therefore, in this study, a software reliability model is developed under the assumption that software failures occur in a dependent manner. We derive the software reliability model through the number of software failure and fault detection rate assuming point symmetry. The proposed model proves good performance compared with 21 previously developed software reliability models using three datasets and 11 criteria. In addition, to ﬁnd the optimal release time, a cost model using the developed software reliability model was presented. To determine this release time, four parameters constituting the software reliability model were changed by 10%. By comparing the change in the cost model and the optimal release time, it was found that parameter b had the greatest inﬂuence. a increases, whereas the optimal release time T ∗ decreases. As the values of parameters b and h increase, the cost model C ( T ) increases, and the release time T ∗ is shown to decrease; in addition, it is found that the change in parameter h had a very slight effect on the optimal release time compared to parameter b . As the value of parameter c increases, the cost model C ( T ) and release time T * increase together. Based on this, it is found that parameter a had a very large minimum width of the cost model compared with the changes of the other parameters, and parameter b had the greatest inﬂuence on determining the optimal release time.


Introduction
Software, one of the main components of a computer, plays an important role in the operation of physical devices. Software was originally developed with the ability to perform extremely small or simple functions. Currently, however, embedded systems that perform multiple functions are being developed. With the rapid development of the software market, technology has also developed, and software is now being used in all fields. Recently, the Internet of Things (IoT) based on the combination of various software, has been commercialized. Furthermore, AIoT (Artificial Intelligence of Things) combined AI (Artificial Intelligence) with IoT (Intelligence of Things) is developing [1]. It means that software has become a very important part not only in the industrial field but also in our daily life.
A software failure is caused by various faults (coding or system errors, etc.). In the past, software failures caused relatively small losses because the degree of software dependence was not as large. However, in today's world, the degree of dependence on software is extremely high, and thus software failures can cause significant social and economic losses. Therefore, we measured the software reliability, which indicates the ability of a software program to avoid failure for a set period of time and refers to how long the software can be used without such a failure.
Early research on software reliability was conducted based on the assumption that software failures occur independently. Goel and Okumoto proposed the GO model, which is the most basic non-homogeneous Poisson process (NHPP) software reliability growth model [2]. The Hossain, Dahiya, Goel and Okumoto (HDGO) model further extended the GO model [3]. Yamada et al., Ohba, and Zhang et al. [4][5][6] proposed an NHPP S-shaped curve model in which the cumulative number of software failures increases to the S curve. In addition, Yamada et al. [7] proposed a new model in which the test effort invested during phase was reflected in the software reliability model. It is a model that reflects even the resources consumed for testing in the previously developed model. Furthermore, Yamada et al. [8] developed a software reliability model with a constant fault detection rate of b(t) = b, assuming incomplete debugging, in which faults detected during the test phase were corrected and removed.
The model developed by extending the above approach involved a generalized incomplete debugging-error detection rate model. Here, the fault detection rate b(t) of the model is not a constant but rather a different function [9][10][11][12][13][14]. It started from the software error causing the failure being immediately eliminated and so a new error can be generated [9]. It progressed to that during the fault removal process, whether the fault is removed successfully or not, new faults are generated with a constant probability [13,14]. In addition, because the operating environment of the software is operated differently for each software program, a comparison is difficult to achieve. Therefore, in [15][16][17], a software reliability model was developed considering uncertain factors in the operating environment. Currently, research using non-parametric methods such as deep learning or machine learning is also being conducted [18][19][20][21].
Recently, finding the most optimal model for reliability prediction is an important concern. Through combination of analytic hierarchy method (AHP), hesitant fuzzy (HF) sets and techniques for order of preference by similarity to ideal solution (TOPSIS), Sahu et al., Ogundoyin et al., and Rafi et al. [22][23][24] found the most optimal software reliability model. However, software failures often occur in a dependent manner because the developed software is composed of extremely complex structures [25]. Here, the dependent failure means that one failure affects other failures or increases the failure probability of other equipment [26,27]. There are two main types of dependent failure. A common cause failure is when several pieces of software fail simultaneously due to a common cause, and a cascading failure is a case in which a part of the system fails and affects other software as well. A software reliability model assuming a dependent failure was developed from the number of software failures and the fault detection rate, which have a dependency relationship in a software reliability model assuming incomplete debugging [28]. In addition, Lee et al. [29] presented a model that assumes that if past software failures are not corrected well, they will continue to have an effect.
In this study, a new software reliability model is developed under the assumption that software failures occur in a dependent manner. It is suitable for a general environment. We show the superiority of the newly developed dependent software reliability model through a comparison under various criteria. In addition, determining the optimal release time of the developed software is also important. If the test period is long, the software will be reliable, but the software development cost will increase. If the test period is short, the reliability of the product may decrease. Therefore, it is important to find a balance between time to market and minimum cost taking into account the installation costs, test costs, and error removal costs, etc. We propose a cost model that combines the proposal software reliability model and the cost coefficient [30][31][32][33]. In addition, among the various parameters of the proposed model, we propose a parameter that has a significant influence on predicting the number of cumulative failures through a variation in the cost model for changes in the parameters [34,35]. Section 2 introduces a new dependent software reliability model and its mathematical derivation. Section 3 introduces the data and criteria, as well as numerical results. Section 4 describes the optimal release time, and finally, in Section 5, we present our conclusions

New Dependent Software Reliability Model
Software reliability refers to the probability that the software will not fail a system for a certain period of time under certain conditions. In other words, it evaluates "how long the software can be used without failure". The reliability function used to evaluate this is as follows: This denotes the probability of the software operating without failure over a specific time t. Here, the probability density function f (t) assumes the software failure time or lifetime as a random variable T. When measuring reliability function R(t), it is assumed that it follows an exponential distribution with parameter λ. In addition, it is assumed that the number of failures occurring in given unit time is a Poisson distribution with parameter λ. When λ is a constant, it is the most basic form, and is called a homogeneous Poisson process. Extending this process, many researchers adapt a model where λ is an intensity function λ(t) that changes with time rather than a single constant by setting λ as a non-homogeneous Poisson process(NHPP) rather than as a homogeneous Poisson process.
In Equation (2), N(t) is the Poisson probability density function with the time dependent parameter m(t). The m(t) is a mean value function which is the integral of λ(t) from 0 to t in Equation (3). The λ(t) is the intensity function indicating the number of instantaneous failures at time t.
A general class of NHPP software reliability models was proposed by Equation (9) to summarize the existing NHPP models as follows: where, the m(t) is calculated using the relationship between the number of failures a(t) at each time point and a fault detection rate b(t) assuming point symmetry in Equation (4). Various software reliability models have been developed based on the assumption that software failures occur independently. However, software failures occur not only independently but also dependently. If the failure is not completely fixed, it will continue to affect the next failure. In addition, as the system becomes more complex, the relationship between failure and failure also shows the dependent relationship because of the dependent combination of several software. Therefore, in this study, we assume that failure is dependent on other failures. The mean value function m(t) based on NHPP software reliability model using the differential equation is as follows: In Equation (5), the m(t) is multiplied once more to assume that the failure occurring from 0 to t affects another failure. We assume: where, a(t) is the number of software failures at each time point and b(t) is the fault detection rate. Parameter a is the expected number of faults, α is the increasing rate of the number of faults, b is the shape parameter, and c is the scale parameter. When time t changes, the change according to the values of parameters b and c in the fault detection rate b(t) are as shown in Figure 1. When b is 1, it is blue, and when it is 1.5, it is red. In addition, when c is 1, it is a dashed line, and when it is 2, it is a dotted line. It can be seen that the larger b is, the larger the b(t) is. where, ( ) is the number of software failures at each time point and ( ) is the fault detection rate. Parameter is the expected number of faults, α is the increasing rate of the number of faults, is the shape parameter, and is the scale parameter. When time changes, the change according to the values of parameters and in the fault detection rate ( ) are as shown in Figure 1. When is 1, it is blue, and when it is 1.5, it is red. In addition, when c is 1, it is a dashed line, and when it is 2, it is a dotted line. It can be seen that the larger is, the larger the (t) is. When solving the differential equation by substituting ( ) and ( ) for ( ) in Equations (5) and (6), we obtain Equation (7): where is a polylogarithm when = 2. At this time, = 0 in ( ).
where ℎ is the number of initial failures. In Equation (8), is calculated through an integration using substitution. When = + and = , it is the same as in Equation (9).
Substituting the result of the substitution integration into Equation (8), the final ( ) is given by Equation (10). When solving the differential equation by substituting a(t) and b(t) for m(t) in Equations (5) and (6), we obtain Equation (7): where h is the number of initial failures. In Equation (8), t 0 (e bx +c) a be bx c+e bt dx is calculated through an integration using substitution. When u = c + be bx and du = be bx dx, it is the same as in Equation (9).
Substituting the result of the substitution integration into Equation (8), the final m(t) is given by Equation (10).
This can be presented as a general model of a dependent failure occurrence in the software reliability model. When t = 0, m(t) is m(0) = ah/(a + h). Table 1 shows the value of m(t) of the existing software reliability model and the model proposed in this study. From models 19-22, it is assumed that a failure occurs in a dependent manner.    Pham (DP1) [28]

Data Information
Datasets 1 and 2 are derived from the online communication system (OCS) of ABC Software Co. and uses data accumulated over a 12-week period. Datasets 1 and 2 show that the cumulative number of failures at t = 1, 2, · · · , 12 is 14, 17, · · · , 81, and 11, 17, · · · , 81, respectively [14]. Dataset 3 is the test data of a medical record system consisting of 188 software titles and data for one of three releases. It shows that the cumulative number of failures is 90, 107, · · · , 204 for t = 1, 2, · · · , 17, respectively [36]. Table 2 shows the accumulated failure data for datasets 1, 2, and 3. We compare the fit between the software reliability models with two failure datasets obtained from OCS and one dataset from Lee et al. [29], which showed good performance as the dependent models (DPF).

Criteria
This study compares various independent and dependent software reliability models and the proposed model introduced Table 1 using 11 criteria. Based on the difference between the actual observed value and the estimated value, we would like to find a better model by comparing it with criteria reflecting the number of parameters used in each model.
First, the mean squared error (MSE) is defined as the sum of squares of the distance between the estimated value and the actual value when considering the number of parameters and the number of observations [37].
where m(t i ) is the estimated value of the model m(t), y i is the actual observed value, n is the number of observations, and m is the number of parameters in each model. Second, the mean absolute error (MAE) defines the difference between the estimated number of failures and the actual value considering the number of parameters and the number of observations as the sum of the absolute values [38].
Third, Adj_R 2 is the modified coefficient of determination of the regression equation and determines how much explanatory power it has in consideration of the number of parameters [39].
Fourth, the predictive ratio risk (PRR) is obtained by dividing the distance from the actual value to the estimated value by the estimated value in relation to the model estimation [40]. Fifth, the predictive power (PP) is obtained by dividing the distance from the actual value to the estimated value by the actual value [41].
Sixth, Akaike's information criterion (AIC) was used to compare likelihood function maximization. This is applied to maximize the Kullback-Leibler level between the probability distribution of the model and the data [42].
Seventh, the predicted relative variation (PRV) is the standard deviation of the prediction bias and is defined as [43] Here, the bias is The root mean square prediction error (RMSPE) can estimate the closeness with which the model predicts the observation [44]: Ninth, the mean error of prediction (MEOP) sums the absolute value of the deviation between the actual data and the estimated curve and is defined as [38] Tenth, the Theil statistic (TS) is the average percentage of deviation over all periods with regard to the actual values. The closer the Theil statistic is to zero, the better the prediction capability of the model. This is defined as [45] Eleventh, it takes into account the tradeoff between the uncertainty in the model and the number of parameters in the model by slightly increasing the penalty each time parameters are added to the model when the sample is considerably small [46].
Based on the above criteria, we compared the proposed model with the existing NHPP software reliability model. When Adj_R 2 is closer to 1, and the other 10 criteria are closer to 0, it indicates a better fit. Using R and MATLAB, the parameters of each model were estimated through the LSE method, and the goodness of fit is calculated to compare the superiority. This is a method of estimating parameters through the difference  Table 1 and the actual number of failures in Table 2, and follows LSE = ∑ n t=1 (y t − m(t)) 2 [47]. Table 3 shows the estimated values for the parameters of each model obtained using dataset 1. Each parameter of the proposed model is represented byâ = 80.0907,b = 0.07231, c = 15.9288, andĥ = 9.8182. Figure Table 4 shows the results of calculating the criteria of each model using the parameters obtained through dataset 1.        6.3751, and 15.6500, respectively, which show the smallest criteria. Adj_R 2 is 0.9723, which is the closest to 1. The model with the second highest criterion is DPF, and Vtub is the third best-fitting model. Table 5. Parameter estimation of model from dataset 2.

No.
Model Estimation      Table 7 shows the estimated values for the parameters of each model obtained using dataset 3. Each parameter of the proposed model is represented throughâ = 194.7684, b = 0.3062,ĉ = 307.0805, andĥ = 135.5641. Figure 4 shows the results of calculating the

Optimal Release Time
When releasing software, it is very important that find the optimal release time. In order to find that, we need to find a time that minimizes the cost. We apply m(t) proposed in Section 2 to the cost model to find the optimal time point between time to market and the minimum cost. The optimal time is suggested based on the cost model that reflects the software installation cost, software test cost, operation cost, software removal cost, and risk cost when the software failure occurs. Figure 5 describes the software field environment from the software installation of the software cost model. The expected software cost model follows Equation (22) [30,31].
where C 0 is the installation cost for system testing, C 1 is the system test cost per unit time, C 2 is the error removal cost per unit time during the test phase, and C 3 is the penalty cost owing to a system failure. In addition, x represents the time the software was used. In addition, in the cost model equation, R(x|T) follows (23) [32,33].

Optimal Release Time
When releasing software, it is very important that find the optimal release ti order to find that, we need to find a time that minimizes the cost. We apply ( posed in Section 2 to the cost model to find the optimal time point between time to m and the minimum cost. The optimal time is suggested based on the cost model that r the software installation cost, software test cost, operation cost, software remova and risk cost when the software failure occurs. Figure 5 describes the software fiel ronment from the software installation of the software cost model. The expected so cost model follows Equation (22) [30,31].
where 0 is the installation cost for system testing, 1 is the system test cost p time, 2 is the error removal cost per unit time during the test phase, and 3 is th alty cost owing to a system failure. In addition, represents the time the softwa used. In addition, in the cost model equation, ( | ) follows (23) [32,33].  In this section, we propose a cost model using dataset 1 based on the propose ware reliability model and find the optimal time point between time to market a minimum cost by changing the cost coefficients from 0 to 3 .

Results of the Optimal Release Time
For the parameters of the cost model, , , , and ℎ calculated through numer amples described in Section 3 were used. The cost coefficient of the cost model a In this section, we propose a cost model using dataset 1 based on the proposed software reliability model and find the optimal time point between time to market and the minimum cost by changing the cost coefficients from C 0 to C 3 .

Results of the Optimal Release Time
For the parameters of the cost model, a, b, c, and h calculated through numerical examples described in Section 3 were used. The cost coefficient of the cost model aims to find the optimal release time with the lowest cost by finding the optimal value through the changes in several values. The baseline value of the cost coefficient is as follows: Here, baseline denotes to the reference value for confirming the change of the cost coefficient. The total cost value obtains as a reference value is 4888.856, and the optimal release time T at this time is 18.3. Table 9 changes the cost coefficient of each reference value, checks the minimum cost C(T) and optimal release time T * , and then checks the changing trend to find the most optimal release time T * . When x = 2, the smallest total cost value obtains 4886.985 at T * = 18.2. When x = 4, the smallest total cost value shows 4888.735 at T * = 18.3. When x = 6, the smallest total cost value shows 4888.856 at T * = 18.3. When x is 8 and 10, the smallest total cost value shows 4888.863 at T * = 18.3. Table 9. Optimal release time of expected total cost according to baseline.

T* C(T) T* C(T) T* C(T) T* C(T) T* C(T)
18 Here, C 0 is the setup cost, and as the value increases, the cost, which is directly proportional, increases as well; thus, the lower the setup cost is, the lower the cost. Table 10 compares the changes when the coefficients of are 300, 500, and 700. It is found that the higher the value is, the higher the total cost value, whereas the optimal time does not change. Therefore, it appears that C 0 does not help determine the optimal release point. However, because the setup cost for a system stabilization is required, the appropriate C 0 cost coefficient is set to 500. Figure 6 shows a graph of the results according to the change in C 0 . Table 10. Optimal release time of expected total cost according to C 0 . value, checks the minimum cost ( ) and optimal release time , and then checks changing trend to find the most optimal release time * . When = 2, the smallest cost value obtains 4,886.985 at * = 18.2. When = 4, the smallest total cost v shows 4,888.735 at * = 18.3 . When = 6 , the smallest total cost value sh 4,888.856 at * = 18.3 . When is 8 and 10 , the smallest total cost value sh 4,888.863 at * = 18.3. Here, 0 is the setup cost, and as the value increases, the cost, which is directly portional, increases as well; thus, the lower the setup cost is, the lower the cost. Tabl compares the changes when the coefficients of 0 are 300, 500, and 700. It is found the higher the value is, the higher the total cost value, whereas the optimal time does change. Therefore, it appears that 0 does not help determine the optimal release po However, because the setup cost for a system stabilization is required, the appropriat cost coefficient is set to 500. Figure 6 shows a graph of the results according to the cha in 0 .   Table 11 compares the changes when the coefficients of C 1 are 10, 20, and 30. The results show that when C 1 is 10, the total cost is the minimum at approximately 18.9 to 19.0, and when C 1 is 20, the minimum value is at 18.2 to 18.3, and when it is 30, the total cost shows the minimum value at approximately 17.8 to 17.9. As the cost coefficient C 1 increases, the optimal release time is gradually pushed back. Figure 7 shows a graph of the results according to the changes in C 1 .   Table 11 compares the changes when the coefficients of 1 are 10, 20, and results show that when 1 is 10, the total cost is the minimum at approximately 19.0, and when 1 is 20, the minimum value is at 18.2 to 18.3, and when it is total cost shows the minimum value at approximately 17.8 to 17.9. As the cos cient 1 increases, the optimal release time is gradually pushed back. Figure 7 graph of the results according to the changes in 1 .   Figure 8 shows a graph of the results acco the change in 2 .   Table 12 compares the changes when the coefficients of C 2 are 30, 40, 50, and 60. It can be seen that the cost coefficient C 2 does not change from 18.2 to 18.3 at the optimal release time as the value changes. Figure 8 shows a graph of the results according to the change in C 2 . Table 13 compares the changes when the coefficients of C 3 are 5000, 7000, 10, 000, and 15, 000. The results show that when C 3 is 5000, the total cost is the minimum at approximately 18.2 to 18.3; when it is 7000, it shows the minimum value at 18.5 to 18.6; when it is 10, 000, the total cost shows the minimum value at approximately 18.9 to 19.0; and when it is 15, 000, the total cost shows the minimum value at approximately 19.2 to 19.3. This indicates that the optimal release time gradually increases as the cost coefficient C 3 increases. Figure 9 shows a graph of the results according to the changes in C 3 .      18.9 to 19.0; and when it is 15,000, the total cost shows the minimum value at approxi mately 19.2 to 19.3. This indicates that the optimal release time gradually increases as the cost coefficient 3 increases. Figure 9 shows a graph of the results according to the changes in 3 .

Results of Variation in Cost Model for Changes in Parameter
In this section, we check whether the optimal release time is affected by the change in the cost model according to the change in the parameters of the proposed model. The parameters a, b, c, and h of the proposed model are set at −20%, −10%, 0%, 10%, and 20%, respectively, in 10% increments, and the coefficient of the cost model is fixed at the baseline value in Section 4.1. Thus, the minimum cost value is calculated depending on changes in the parameters, and it derives appropriate release time. In Table 14, 0% is the same as the value suggested in Table 9 by substituting the parameter estimates described in Section 3 and the coefficient values of the cost model proposed in Section 4.  From Table 14 and Figures 10-13, the value of the cost model C(T) increases as the change in parameter a increases, whereas the optimal release time T * decreases. As the values of parameters b and h increase, the cost model C(T) increases, and the release time T * is shown to decrease; in addition, it is found that the change in parameter h had a very slight effect on the optimal release time compared to parameter b. As the value of parameter c increases, the cost model C(T) and release time T* increase together. Based on this, it is found that parameter a had a very large minimum width of the cost model compared with the changes of the other parameters, and parameter b had the greatest influence on determining the optimal release time.

Results of Variation in Cost Model for Changes in Parameter
In this section, we check whether the optimal release time is affected in the cost model according to the change in the parameters of the propos parameters , , , and ℎ of the proposed model are set at −20%, −10%, 20%, respectively, in 10% increments, and the coefficient of the cost model baseline value in Section 4.1. Thus, the minimum cost value is calculated changes in the parameters, and it derives appropriate release time. In Tabl same as the value suggested in Table 9 by substituting the parameter estim in Section 3 and the coefficient values of the cost model proposed in Section From Table 14 and Figures 10-13, the value of the cost model ( ) in change in parameter increases, whereas the optimal release time * de values of parameters and ℎ increase, the cost model ( ) increases, a time * is shown to decrease; in addition, it is found that the change in par a very slight effect on the optimal release time compared to parameter . A parameter increases, the cost model ( ) and release time * increase t on this, it is found that parameter had a very large minimum width of t compared with the changes of the other parameters, and parameter ha influence on determining the optimal release time.

Conclusions
In this study, a new software reliability model was developed under th that software failures occur in a dependent manner. We used three dataset uations. The first and second datasets showed the best fit, and the third da better results compared with many previously proposed models. The pro showed better results than DP1, DP2, and DPF, which are previously develo dependent failure occurrence models.
In addition, based on the proposed model, the optimal release time ac change in the cost coefficient was suggested, and the total cost was analyzed When the test cost was increased, the release time gradually increased, as d cost; therefore, the optimal release time can be achieved when 1 is 20. In model, fault detection rate was found to be the most important param mining the optimal release time.
In the past, studies were conducted by assuming independence in th ware failures; however, in a real environment, the software execution envir tremely diverse and complex. Therefore, it is necessary to develop a mode a dependent failure occurrence and propose a model that considers the ac environment. We plan to conduct a study using machine learning and dee the proposed software-dependent failure occurrence in the future work.

Conclusions
In this study, a new software reliability model was developed under the assumption that software failures occur in a dependent manner. We used three datasets for our evaluations. The first and second datasets showed the best fit, and the third dataset showed better results compared with many previously proposed models. The proposed model showed better results than DP1, DP2, and DPF, which are previously developed software-dependent failure occurrence models.
In addition, based on the proposed model, the optimal release time according to the change in the cost coefficient was suggested, and the total cost was analyzed accordingly. When the test cost was increased, the release time gradually increased, as did the overall cost; therefore, the optimal release time can be achieved when C 1 is 20. In the proposed model, fault detection rate b was found to be the most important parameter for determining the optimal release time.
In the past, studies were conducted by assuming independence in the case of software failures; however, in a real environment, the software execution environment is extremely diverse and complex. Therefore, it is necessary to develop a model that assumes a dependent failure occurrence and propose a model that considers the actual operating environment. We plan to conduct a study using machine learning and deep learning for the proposed software-dependent failure occurrence in the future work.