A Bayesian Pipe Failure Prediction for Optimizing Pipe Renewal Time in Water Distribution Networks

: The sustainable management of the water supply system requires methodologies to monitor, repair, or replace the aging infrastructure, but more importantly, it must be able to assess the condition of the networks and predict their behavior over time. Among other infrastructure systems, the water distribution network is one of the essential civil infrastructure systems; therefore, the effective maintenance and renewal of the infrastructure’s physical assets are essential. This article aims to determine pipe failure prediction to optimize pipe renewal time. This research methodology investigates the most appropriate parameters for predicting pipe failure in the optimization. In particular, the non-homogeneous Poisson process (NHPP) with the Markov chain Monte Carlo (MCMC) approach is presented for Bayesian inference, while maximum likelihood (ML) is applied for frequentist inference as a comparison method. It is concluded that the two estimations are relatively appropriate for predicting failures, but MCMC estimation is closer to the total observed data. Based on life-cycle cost (LCC) analysis, the MCMC estimation generates ﬂatter LCC curves and lower LCC values than the ML estimation, which affects the decision making of optimum pipe renewal in water distribution networks.


Introduction
The water distribution network (WDN) is one of the essential public infrastructure systems. It requires frequent inspection, occasional maintenance, and swift repairs throughout its service life to maintain its performance levels [1]. Therefore, it is essential to maintain effective maintenance and repair plans and to refresh the system's physical assets, particularly in regions where widespread failure is possible. A complicated WDN is susceptible to failure and may result in the collapse of the pipeline, the failure of neighboring utilities such as underground electrical lines, the disruption of traffic and local economic activity, and even fatalities. Environmental variables, the age of the assets, and component deterioration may cause system failures. These faults can occur at any time, anywhere, and have a variety of threatening consequences for other infrastructure systems, such as transportation networks and building foundations [2].
The sensitivity of WDNs to failure is the driving force behind the deployment of asset management, which aims to achieve an optimal price and service level for customers. The most expensive component of the WDN is the pipeline utility. Pipe failure is affected by the fundamental properties of the utility, such as pipe material, diameter, length, age, and previous failure history. Establishing a correlation between the failure rate and these variables is essential for assessing network status and preventing catastrophic failure. In the WDN, decision makers must implement infrastructure asset management to ensure that infrastructure performance achieves service level, risk management, and cost management in the context of the asset life-cycle at the most cost-effective level. If asset management is a

Literature Review
Pipe failure prediction is required for the development of preventative techniques in infrastructure management. Shamir and Howard's [6] article on establishing the optimal pipe renewal time is a crucial source for determining the appropriate renewal interval. It comprised a deterministic model for optimizing the economic effectiveness of repair versus renovation times. Since then, deterministic, statistical, and machine learning methods have been applied in the research of pipe performance and resulting lifetime. Lauer [7] presented an excellent general strategy for network maintenance with insight into failure prediction. Rostum [8] offered a thorough statistical study of the pipe failure process based on the NHPP model. Kleiner et al. [9] proposed economies of scale into a model of pipe renewal, while Fuchs-Hanusch et al. [10] modified the pipe life-cycle equation by integrating leak detection costs that increase with pipe age. Scholten et al. [11] developed a failure model using the exponential-Weibull distribution. It was also used in multi-criteria decision analysis to rank various long-term rehabilitation alternatives. Amaitik et al. [12] supported using neural networks for failure prediction and pipe renovation, whereas Kabir [13] utilized a Bayesian framework to accomplish the same goal. Motiee et al. [14] examined four alternative regression models, Di Nardo et al. [15] applied fractal theory to evaluate the robustness of pipe failures, and Kutylowska [16] predicted pipe failure rates with support vector machines. Specifically, research on the Weibull proportional hazard model (WPHM) can be found in the works of Le Gat and Eisenbeis [17].
Giraldo-Gonzales and Martinez [6] exposed several statistical models, such as Poisson regression, linear regression, and EPR, to predict the number of pipe group failures. Due to their explicit polynomial expressions, which offer a decent correlation between covariates and dependent variables, these models are recommended. Linear regression is an extension of regression analysis that includes covariates as explanatory variables in the prediction equation. In the linear regression model, whether the value of the covariate increases or decreases, the value of the dependent variable changes at a constant rate. Consequently, the fundamental disadvantage of statistical models is their dependence on the availability of comprehensive data. Probabilistic statistical models need the development of a timedependent failure rate function, in which the time until the next failure varies based on the conditions of the previous failure. Based on previous research, this can be achieved by performing the time-dependent Poisson model and the Markov process in conjunction with the Bayesian principle. Atique and Attoh-Okine [18] constructed a pipe failure model using Bayesian inference and the Copula parameter to describe bivariate distribution by sampling the distribution, including the dependent variable. Lin and Yuan [19] created an NHPP model and presented a two-scale process with two-time variables applying Markov chain Monte Carlo (MCMC).
The importance of asset management in urban water utilities continues due to their technological, economic, and environmental ramifications being substantial and numerous. Specifically, providing accurate estimates for the service life of pipes is a crucial component of the asset management problem. In Mailhot et al.'s [20] model, another optimization-based rehabilitation planning technique with a cost-conscious focus is considered as an indicator of the pipe's structural integrity to calculate the best renewal criterion. Hong et al. [21] proposed an analytical approach for the optimal pipe renewal based on the annual cost as a proportion of the overall cost by lowering the total expected cost over a set service life. A mathematical model developed by Luong and Nagarur [22] can assist in determining whether to repair or replace the pipe and how to deploy maintenance expenditures most effectively. The optimization formulation's objective function is the total system availability over the long term. Grigg [23] offered a risk-based approach to pipe renewal to avoid utilizing a suboptimal budget. Lansey et al. [24], Kim and Mays [25], and Shin et al. [8] established more cost-minimization-focused models. Other optimization models integrate system cost and reliability as competing objectives. Dandy and Engelhardt [26] developed a trade-off curve for reliability and cost for the effective pipe renewal option.
Notably, several researchers have implemented life-cycle cost (LCC) analysis in WDN. This powerful notion highlights the analytical tools that assist decision makers in making the most cost-effective choices among the options provided to them at various life-cycle stages and, therefore, with varying costs. Shamir and Howard [27] constructed an exponential relationship between a pipe failure rate and its age to calculate the pipe renewal interval that minimizes total repair and renewal costs. Lee et al. [28] classified every network object to give an inventory-based technique for the LCC analysis of a WDN. This methodology was created to assist decision makers in determining when and which WDN components require repair.
Marzouk and Osama [29] developed a method to assist decision makers in WDN management with their short-term and long-term plans. Four objective functions were considered: the risk index, the infrastructure condition, the level of asset service, and the LCC. The failure probability was simulated using a fuzzy Monte Carlo simulation. The research discovered that economic variables influenced asset failure results significantly, whereas pipe size influenced the overall failure consequence index. Jayaram and Srinivasan [28] suggested an innovative multi-objective approach for lowering LCC and improving network performance. Roshani and Filion [30] created the OptiNET model to reduce renewal time and pipe diameter while reducing LCC. Capital and operating expenses were evaluated as the goal function for determining the optimal renewal age. According to Frangopol and Soliman [31], LCC analysis could significantly reduce long-term expenditures while improving the resilience and sustainability of the infrastructure. Based on the LCC assessment, Ghobadi et al. [32] present a new pipe renewal scheduling strategy to smooth the investment time series for a large-scale WDN.

Study Area and Data Sources
This article proposes the Malang City Water Network, East Java Province, Indonesia, as a case study. As the second biggest urban water supply in East Java Province, the network serves an area of 110 km 2 . The coverage of the number of customers currently reaches 98% of the population of the city, approximately 680,000 citizens, and consists of 171,000 customer junctions. The quantity of the water supply is 1525 L per second. The various types of pipes in the study area consist of galvanized iron (GI), asbestos-cement (AC), polyvinyl chloride (PVC), and high-density polyethylene (HDPE), with the attributes listed in the Table 1. The main distribution network consists of PVC pipes, while AC and GI pipes are gradually being replaced with HDPE. The pipe failure database was created in 2012 as a recommendation from the cooperation program between the Malang City Public Water Works Company and AusAID for urban water development. The Malang City Public Water Works Company conducts reliability evaluations every 20 years on each type of pipe material based on the availability of sufficient pipe failure data. At this point, decision makers need a reliability analysis for PVC pipe materials that have reached the age of 20 years, which have reached 30% of the planned age for network operations. In this regard, the article's scope is limited to analyzing PVC distribution network pipes. Although the PVC pipes were installed in 2002, the data availability begins in 2012 and concludes in 2021. Table 2 shows annual pipe failures of PVC pipes in the study area.

Counting Process
The counting process can be used to determine the sequence of events or an uncertain process of events. A random variable, N(t), is the number of failure occurrences in the time interval (0, t). The process {N(t), t ≥ 0} is a stochastic process or is specifically referred to as a counting process, if [33]: If s < t, then N(s) ≤ N(t). Based on the characteristics, a system can be divided into a repairable and a nonrepairable system. In the repairable system, the failed component can be repaired with several repair processes and it is unnecessary to replace the entire system. In the nonrepairable system, the failed component must be replaced with a new component because the failed component cannot be repaired. In the WDN, pipe networks can be described as repairable components. In a repairable system, after the minimal repair is conducted, the failed system will function in the same condition as it was at the time of the last failure. The minimal repair assumes that the repair time is very short; only a small proportion of the system elements will be replaced in the repair process [34].

Poisson Process
When analyzing a repairable system, it is important to focus on the characteristics of the system's pattern of successive failures. If the system demonstrates a trend (i.e., a tendency for failures to occur more frequently or less frequently), it is evident that nonstationary approaches must be employed. The non-homogeneous Poisson process (NHPP) is the most frequently used model to describe a trend in repairable systems [34].
If a component fails with an intensity function (λ), N is the number of events that arise from a failure in a time interval (0, t), and the number of failures that follow a Poisson distribution can be written as [35]: NHPP is a Poisson process with various intensity functions. In the WDN, NHPP is a simple model that can be applied to systems with decreasing or increasing failure rates. The failure events depend on a specific time interval, where the observation results in discrete data and between events are mutually independent. The intensity function of the Poisson regression is as follows [6]: where β i denotes the pipe failure parameter to be estimated and x i represents the pipe failure variable.

Bayesian Inference
Conceptually, the Bayesian method is based on the Bayes theorem, where the posterior distribution that is used to construct the likelihood function is a combination of prior distribution and observation data. The posterior distribution of p(λ|x ) can be determined using Bayes' theorem to become [36]: where p(x|λ ) is the likelihood function of the data containing the sample data information and can be written. Meanwhile, p(λ) is the prior distribution function of the parameters and p(x) is a constant density function. The likelihood function is a representation of the data condition, while the determination of the prior distribution is more to the researcher's subjectivity based on specific considerations. The specification of the prior distribution in Bayesian inference is also important, because the prior distribution will affect the inference of the posterior distribution. Determining the prior distribution is the key to inference analysis with Bayes and is the most important step in describing inference [36]. After the prior distribution is specified, the process to obtain the posterior distribution of the likelihood function and the prior distribution uses an analytical process or numerical integral, which is complicated to solve. In the Bayesian method, it can be solved using Markov chain Monte Carlo (MCMC). Through the MCMC method, it is possible to generate a sample from any posterior density function p(λ|x ) and then use the sample to calculate the expected value of the posterior [37].
The important thing in using MCMC is that if the simulation algorithm is implemented correctly, the Markov chain will converge to the particular distribution. The implementation of the MCMC method for Bayesian inference requires an appropriate sampling algorithm to obtain a sample from a distribution. Some of the algorithms developed for numerical processing in this MCMC method include the Gibbs sampling algorithm [37].

Maximum Likelihood Estimation
This article compares the Bayesian methodology with the frequentist inference, maximum likelihood (ML). L(θ;t) denotes the likelihood function when covariates are present. Consider the probability function to measure the likelihood that produced the observed T values. There is information provided for m independent observations with equal intensity function λ(t). Individual data i is monitored across the interval (ai, bi), and ni events are recorded at the intervals t ij , where j = 1, 2, . . . , n i and i = 1, 2, . . . , m. The likelihood function for all m processes is given by [38]: The maximum likelihood estimation for pipe failures in the n events of failure, on the t time of failure, is expressed in the following equation [35]: The likelihood equations must be uniquely derived for a given distribution and estimation issue. Frequently, the mathematics is not simple, especially if confidence intervals for the parameters are required. Typically, the numerical estimation is not trivial. It is better to rely on high-quality statistical tools to produce maximum likelihood estimates, with a few exceptions where simple maximum likelihood formulas exist. Fortunately, the prevalence of high-quality maximum likelihood software is growing. For small samples, maximum likelihood estimates might be significantly biased, and it is possible that the optimality properties might not apply to small samples [39].

Life-Cycle Cost (LCC)
Life-cycle cost (LCC) is one of the most important factors to consider when determining the most cost-effective system solution. This article applies LCC evaluation to find the optimum time to replace the pipe. The LCC of a pipe network is the total of all costs incurred throughout its lifetime. Calculating the costs of procurement, operation, replacement, and disposal determines the LCC for each pipe attribute. This article uses the formula to calculate the LCC [32]: where D represents the pipe diameter (mm), CI represents the initial cost (USD/km/year), CR represents the running cost (USD/failure), and t is the pipe renewal interval (year). In this article, the CI only includes the procurement cost and the CR only includes the repair cost. The CI is expressed as follows [32]: where CP denotes the cost of pipe procurement (USD/km). Repair cost multiplied by the average failure rate throughout the renewal interval yields the CR and is expressed as follows [32]: where Cr is the pipe repair cost (USD/failure) and Fr is pipe failure rate (failure/km/year), which is determined from the failure analysis. Determining the LCC for each pipe feature enables the most cost-effective prediction of pipe age. Figure 1 shows an illustration of the LCC curve, presented with the pipe age, where operating costs increase while initial costs decrease. The lifetime with the lowest total cost is the most cost-effective. The lowest LCC for a particular pipe is referred to as the optimum LCC. The LCC curve provides the advantage of anticipating the ideal economic renewal time (t*) and displays the variation in total costs around this point. As shown in Figure 1, the renewal period should be adjusted within a time interval centered on the ideal point (t*) of the annual investment time series [32].
follows [32]: where Cr is the pipe repair cost (USD/failure) and Fr is pipe failure rate (failure/km/year), which is determined from the failure analysis. Determining the LCC for each pipe feature enables the most cost-effective prediction of pipe age. Figure 1 shows an illustration of the LCC curve, presented with the pipe age, where operating costs increase while initial costs decrease. The lifetime with the lowest total cost is the most cost-effective. The lowest LCC for a particular pipe is referred to as the optimum LCC. The LCC curve provides the advantage of anticipating the ideal economic renewal time (t*) and displays the variation in total costs around this point. As shown in Figure 1, the renewal period should be adjusted within a time interval centered on the ideal point (t*) of the annual investment time series [32].

Pipe Failure Intensity
The analysis of pipe deterioration in the WDN, which is assumed to be a repairable system, focuses on observing the characteristics of failure patterns that occur successively. In this article, the modeling of pipe deterioration uses NHPP as a non-stationary approach. The counting process is applied to analyze the number of failures in an interval of failure time. Using an alpha level of 5%, the chi-squared test for homogeneity yields a chi-square value of 162.03 and a p-value of 2.2 × 10 −16 . Based on the examination results, it can be determined that the failure intensity is an inhomogeneous Poisson process with monthly failure intensity is shown in figure 2.

Pipe Failure Intensity
The analysis of pipe deterioration in the WDN, which is assumed to be a repairable system, focuses on observing the characteristics of failure patterns that occur successively. In this article, the modeling of pipe deterioration uses NHPP as a non-stationary approach. The counting process is applied to analyze the number of failures in an interval of failure time. Using an alpha level of 5%, the chi-squared test for homogeneity yields a chi-square value of 162.03 and a p-value of 2.2 × 10 −16 . Based on the examination results, it can be determined that the failure intensity is an inhomogeneous Poisson process with monthly failure intensity is shown in Figure 2.

Parameter Estimation of Bayesian Inference
In Bayesian inference, a directed acyclic graph (DAG) can graphically represent the relationship between the data and the prior distribution of parameters. The DAG in figure  3 depicts the relationship between the data, model parameters, and parameter values. In the graph, a single line represents a stochastic relationship. The box-shaped node represents constant parameters or data, whereas the elliptical node represents stochastically changing parameters or logical structural linkages. This article proposes pseudo prior in determining the prior distribution parameter to ensure that the parameter estimation process iterates rapidly and complies with the Markov chain. The posterior distribution determines model parameters using the MCMC method and Gibbs sampling algorithm. The

Parameter Estimation of Bayesian Inference
In Bayesian inference, a directed acyclic graph (DAG) can graphically represent the relationship between the data and the prior distribution of parameters. The DAG in Figure 3 depicts the relationship between the data, model parameters, and parameter values. In the graph, a single line represents a stochastic relationship. The box-shaped node represents constant parameters or data, whereas the elliptical node represents stochastically changing parameters or logical structural linkages. This article proposes pseudo prior in determining the prior distribution parameter to ensure that the parameter estimation process iterates rapidly and complies with the Markov chain. The posterior distribution determines model parameters using the MCMC method and Gibbs sampling algorithm. The results must be irreducible, aperiodic, and recurrent, as evidenced by the autocorrelation plot, the history plot, and the kernel density plot, respectively.

Parameter Estimation of Bayesian Inference
In Bayesian inference, a directed acyclic graph (DAG) can graphically represent the relationship between the data and the prior distribution of parameters. The DAG in figure  3 depicts the relationship between the data, model parameters, and parameter values. In the graph, a single line represents a stochastic relationship. The box-shaped node represents constant parameters or data, whereas the elliptical node represents stochastically changing parameters or logical structural linkages. This article proposes pseudo prior in determining the prior distribution parameter to ensure that the parameter estimation process iterates rapidly and complies with the Markov chain. The posterior distribution determines model parameters using the MCMC method and Gibbs sampling algorithm. The results must be irreducible, aperiodic, and recurrent, as evidenced by the autocorrelation plot, the history plot, and the kernel density plot, respectively. The model is examined by enumerating the significant contributions of each predictor variable according to whether the zero value falls within the credible interval of its posterior distribution. The form of the λ equation developed from Equation (2) with predictor variables is as follows: where D represents the variable of pipe diameter (in mm) and A represents the variable of pipe age (in years). Table 3 shows the MCMC estimation results for each parameter β0, β1, and β2. The model is examined by enumerating the significant contributions of each predictor variable according to whether the zero value falls within the credible interval of its posterior distribution. The form of the λ equation developed from Equation (2) with predictor variables is as follows: where D represents the variable of pipe diameter (in mm) and A represents the variable of pipe age (in years). Table 3 shows the MCMC estimation results for each parameter β0, β1, and β2.
The findings revealed a positive association between pipe failure rate and pipe age, as well as a negative relationship between pipe failure rate and pipe diameter.
In Figure 4, the autocorrelation plot shows that the correlation between the generated sample values is in the posterior distribution area.
The findings revealed a positive association between pipe failure rate and pipe age, as well as a negative relationship between pipe failure rate and pipe diameter.
In Figure 4, the autocorrelation plot shows that the correlation between the generated sample values is in the posterior distribution area.  Figure 5 depicts the estimated kernel density functions. The solid lines in the DAG depict these functions, respectively. Furthermore, the parameter estimate generated from this replication is calculated using the arithmetic mean in the density functions. The kernel density plot in Figure 5 shows that the posterior density for the model parameters gives a bell shape, following the distribution pattern of the model parameters. It is an indication that the Markov chain is convergent and consistent with the results of previous research [40].  Figure 5 depicts the estimated kernel density functions. The solid lines in the DAG depict these functions, respectively. Furthermore, the parameter estimate generated from this replication is calculated using the arithmetic mean in the density functions. The kernel density plot in Figure 5 shows that the posterior density for the model parameters gives a bell shape, following the distribution pattern of the model parameters. It is an indication that the Markov chain is convergent and consistent with the results of previous research [40].  Figure 6 depicts the results of the history plot on the MCMC simulation process. According to the figure, the history plot is stationary and random. It signifies that all of the generated samples fall within a specific domain interval. It is dense and contains all conceivable parameter values. The history plot is judged to be irreducible. This graphically depicts the chains' rapid convergence and, as a result, verifies the MCMC technique's usefulness. Iriawan and Yasmirullah [40] proposed that it is self-evident that using the MCMC approach will increase the parameter estimation accuracy and provide the empirical standard error that will be utilized to evaluate variability.  According to the figure, the history plot is stationary and random. It signifies that all of the generated samples fall within a specific domain interval. It is dense and contains all conceivable parameter values. The history plot is judged to be irreducible. This graphically depicts the chains' rapid convergence and, as a result, verifies the MCMC technique's usefulness. Iriawan and Yasmirullah [40] proposed that it is self-evident that using the MCMC approach will increase the parameter estimation accuracy and provide the empirical standard error that will be utilized to evaluate variability.
cording to the figure, the history plot is stationary and random. It signifies that all of the generated samples fall within a specific domain interval. It is dense and contains all conceivable parameter values. The history plot is judged to be irreducible. This graphically depicts the chains' rapid convergence and, as a result, verifies the MCMC technique's usefulness. Iriawan and Yasmirullah [40] proposed that it is self-evident that using the MCMC approach will increase the parameter estimation accuracy and provide the empirical standard error that will be utilized to evaluate variability.

Parameter Estimation of Frequentist Inference
This article proposes the frequentist inference, maximum likelihood (ML), to be compared with MCMC. Table 4 shows the ML estimation results for each parameter β0, β1, and β2 with standard error (SE).

Parameter Estimation of Frequentist Inference
This article proposes the frequentist inference, maximum likelihood (ML), to be compared with MCMC. Table 4 shows the ML estimation results for each parameter β0, β1, and β2 with standard error (SE). By substituting the estimated parameters into Equation (10), the intensity function of ML is as follows: λ = exp(6.88818 − 0.01489D + 0.051891 A) If we compare the two equations generated from MCMC and ML, it can be seen that although the parameters have different values, they tend to have the same relationship with the pipe failures. The relationship between pipe diameter and pipe failure is negative, while the age of the pipe has a positive relationship. This result is consistent with previous pipe failure analysis findings by several researchers [41,42].
According to the two equations generated by MCMC and ML, the diameter of the pipe correlates negatively with the number of failures, which is consistent with earlier studies [4,11,30]. The methods used by Giraldo-Gonzales et al. [8] revealed a negative association between diameter and rate of failure. The latter is because the deterioration process of pipes varies depending on the material. Construction methods, corrosion processes, and climatic variables can all have an impact on the link between pipe age, diameter, and failure rate. Many other elements, such as water pressure, rainfall near the pipe, and soil type, might influence the occurrence of pipe faults [3]. Pipes with a smaller diameter typically fail at a higher rate than bigger ones. The increased failure rate of small-diameter pipes is mostly due to the pipe's reduced resistance to soil movement and corrosion as a result of reduced wall thickness.
Pipe deterioration due to age is a well-known phenomenon. Older pipelines fail at a higher rate. The failure rate is projected to rise in the months following installation, then fall for several years before rising with the pipe age [43]. Due to the production phase, on-site placement and operating practices, and external circumstances, the relationship between the age of a pipe and its failure rate is unknown. However, this could be explained by the fact that some pipes are older than when the first pipe failures were recorded [44]. Other authors have noted that the models only consider quantitative factors as the cause of this result. Changes in material quality and material strength may lead to age-dependent differences in pipe performance; however, these variables are not evaluated [6]. Other researchers [45][46][47] discovered variations in the fitted values of the variables between materials. The observed failure occurrences were influenced by the specified parameters and the age of the pipe. It is important to realize that as the pipeline ages, the failure rate will increase, and the model structure may need to be modified. It is relevant to Xu et al.'s [48] notation that no reliable model is applicable to every case.

Pipe Failure Analysis
In the next step, we perform a pipeline failure analysis based on the pipe failure model. Table 5 shows the predicted annual pipe failure values compared to the observed values. From the analysis results, the model shows a lower number of failures in the early years, a larger number of failures in the middle years, and a lower number of failures in the later years. The model tends to have a middle rating of the peak number of failures indicated by the observation data. On the other hand, we discover that the number of failures between the MCMC and ML estimation is not significantly different, despite the fact that MCMC estimation is closer to the observed data.  2012  567  501  505  2013  569  491  490  2014  372  484  478  2015  459  581  593  2016  680  530  527  2017  380  467  449  2018  471  561  556  2019  595  526  511  2020  621  575  566  2021  688  681  690   Total  5402  5396  5365 We use a graph depicting the number of expected failures and observed failures for each year to evaluate the model's accuracy. Figures 7 and 8 illustrate the pipe failure plot over time. The total numbers of predicted and observed failures are comparable, and it is exciting to observe the model's performance in the event of failure. In this evaluation, cumulative and annual charts are practical visual assessment techniques. Annual plots can be utilized to compare model outcomes to actual failures visually. This plot can also depict the inter-failure time trend of the networks [49].
The following step compares the observed and predicted failures by scattering the predicted failures to the failure pair plot. An appropriate model is one in which the Y=X line is shown around the scattered predicted failures. From Figure 9, it can be seen that both models produce predictive values that are not much different from the observed values, which can be seen from the distribution of the predicted values around the Y=X line.
We use a graph depicting the number of expected failures and observed failures for each year to evaluate the model's accuracy. Figures 7 and 8 illustrate the pipe failure plot over time. The total numbers of predicted and observed failures are comparable, and it is exciting to observe the model's performance in the event of failure. In this evaluation, cumulative and annual charts are practical visual assessment techniques. Annual plots can be utilized to compare model outcomes to actual failures visually. This plot can also depict the inter-failure time trend of the networks [49]. The following step compares the observed and predicted failures by scattering the predicted failures to the failure pair plot. An appropriate model is one in which the Y=X line is shown around the scattered predicted failures. From Figure 9, it can be seen that both models produce predictive values that are not much different from the observed values, which can be seen from the distribution of the predicted values around the Y=X line. We use a graph depicting the number of expected failures and observed failures for each year to evaluate the model's accuracy. Figures 7 and 8 illustrate the pipe failure plot over time. The total numbers of predicted and observed failures are comparable, and it is exciting to observe the model's performance in the event of failure. In this evaluation, cumulative and annual charts are practical visual assessment techniques. Annual plots can be utilized to compare model outcomes to actual failures visually. This plot can also depict the inter-failure time trend of the networks [49]. The following step compares the observed and predicted failures by scattering the predicted failures to the failure pair plot. An appropriate model is one in which the Y=X line is shown around the scattered predicted failures. From Figure 9, it can be seen that both models produce predictive values that are not much different from the observed values, which can be seen from the distribution of the predicted values around the Y=X line. Another method for validating pipe failure prediction is to run tests on each quartile of the pipe failures. The predictive data for a period are validated using observational data from the same quartile in this test. If a model is applied to a pipeline life phase, it should be able to identify the pipeline with the highest failure quartile. If the predicted pipe has the highest likelihood of failure and the observational data demonstrate the same thing, the model should be judged valid for the pipe's service life [49]. The quartile results are displayed in the following table. Table 6 and 7 demonstrate that the both models are relatively appropriate at predicting failure in each quartile. There is a tendency for the predicted failures to be consistent with the observed failures. Tables 6 and 7 show that most of the failures occurred in the first quartile, with fewest occurring in the second.  Another method for validating pipe failure prediction is to run tests on each quartile of the pipe failures. The predictive data for a period are validated using observational data from the same quartile in this test. If a model is applied to a pipeline life phase, it should be able to identify the pipeline with the highest failure quartile. If the predicted pipe has the highest likelihood of failure and the observational data demonstrate the same thing, the model should be judged valid for the pipe's service life [49]. The quartile results are displayed in the following table. Table 6 and 7 demonstrate that the both models are relatively appropriate at predicting failure in each quartile. There is a tendency for the predicted failures to be consistent with the observed failures. Tables 6 and 7 show that most of the failures occurred in the first quartile, with fewest occurring in the second. It is critical to highlight that the failure rate examines the group of pipes rather than the history of pipe failure for each pipe [6]. The group of pipes in the first quartile has the highest priority for renewal. From previous research, prediction models developed with unbalanced data sets will have a greater tendency to anticipate the number of errors, which can reduce the prediction accuracy. The solution to this problem is to extend the length of the data series by accumulating errors over a longer period of time [50]. The water utility manager must monitor and take appropriate precautions, for example, maintenance or renewal after inspection. When there is no information about variables other than pipe features, the described method for predicting pipe failure is reasonably applicable. Selecting the proper group size depending on the data attributes is recommended [51].
The prediction validation, according to Ramirez et al. [7], should be conducted by examining and projecting the previous data and then comparing the earlier years of the projection with the more recent failure data. Scheidegger et al. [52] showed that there is no preferred model structure. The following points should be considered when selecting the most appropriate failure model: the appropriate probability function should be selected based on the data properties, the model estimate must correlate with the original inquiry, and the failure model assumptions must be consistent with operator experience [53].

Life-Cycle Cost (LCC) Analysis
The LCC analysis can support the WDN authority to decide an efficient and costeffective relocation strategy [25]. Figure 10 shows the LCC analysis from the MCMC estimation. Although the initial cost (CI) is different for each diameter, the trend decreases and flattens as the pipe age approaches 50 years. At the age of the pipe above 30 years, the running cost (CR) takes over the total cost, depending on the pipe failure rate from the pipe failure analysis results. The higher the pipe failure rate, the more the point of intersection of the CI and CR curves will shift to the left. This condition is similar to the optimal value of the LCC, where the higher the failure rate, the more the optimal point will shift to a shorter age. Based on the lowest LCC curve value for each diameter, the optimal pipe renewal time is obtained at the age of 35 years. Figure 11 depicts the LCC analysis from the ML estimation and shows that the lowest LCC value is in the lifetime range of 25 to 35 years. At diameters of 63 mm and 90 mm, the lowest LCC is at the age of 35 years. When the pipe diameter is 110 mm, the lowest LCC is at the age of 30 years, and this increases for the pipe diameter of 150 mm, where the lowest LCC is in the age range of 25 years. As the diameter increases, the optimum lifetime decreases. The LCC analysis shows a different trend from the MCMC parameter estimation results. However, the ML estimation results for pipe diameters of 63 mm and 90 mm show the optimum pipe age values, which are relatively identical to the MCMC estimation results. If we look at the value of LCC in the same pipe age period, the ML estimation results show a higher value, and this difference becomes more significant as the time variable increases, even though the analysis in both models is conducted with the same value of cost components. The difference, in this case, is the pipe failure rate from the estimation results. Overall, the ML estimation produces higher LCC values and a shorter optimum pipe lifetime than the MCMC estimation.
inquiry, and the failure model assumptions must be consistent with operator experience. [53].

Life-Cycle Cost (LCC) Analysis
The LCC analysis can support the WDN authority to decide an efficient and costeffective relocation strategy [25]. Figure 10 shows the LCC analysis from the MCMC estimation. Although the initial cost (CI) is different for each diameter, the trend decreases and flattens as the pipe age approaches 50 years. At the age of the pipe above 30 years, the running cost (CR) takes over the total cost, depending on the pipe failure rate from the pipe failure analysis results. The higher the pipe failure rate, the more the point of intersection of the CI and CR curves will shift to the left. This condition is similar to the optimal value of the LCC, where the higher the failure rate, the more the optimal point will shift to a shorter age. Based on the lowest LCC curve value for each diameter, the optimal pipe renewal time is obtained at the age of 35 years.  Figure 11 depicts the LCC analysis from the ML estimation and shows that the lowest LCC value is in the lifetime range of 25 to 35 years. At diameters of 63 mm and 90 mm, the lowest LCC is at the age of 35 years. When the pipe diameter is 110 mm, the lowest LCC is at the age of 30 years, and this increases for the pipe diameter of 150 mm, where the lowest LCC is in the age range of 25 years. As the diameter increases, the optimum lifetime decreases. The LCC analysis shows a different trend from the MCMC parameter estimation results. However, the ML estimation results for pipe diameters of 63 mm and 90 mm show the optimum pipe age values, which are relatively identical to the MCMC estimation results. If we look at the value of LCC in the same pipe age period, the ML If the total cost curve flattens around the optimal age, the decision makers do not need to schedule renewal exactly at that age (t*). If the LCC curve is uneven and fluctuating on both sides of the optimal age, repositioning should be conducted as near as possible to the optimal age. If the LCC curve is uneven and deviates significantly from the optimal If the total cost curve flattens around the optimal age, the decision makers do not need to schedule renewal exactly at that age (t*). If the LCC curve is uneven and fluctuating on both sides of the optimal age, repositioning should be conducted as near as possible to the optimal age. If the LCC curve is uneven and deviates significantly from the optimal point, renewal should be undertaken as near as is feasible to the optimal age. Modifying the renewal interval will affect the overall system life. Figure 12 depicts LCC curves, with each representing a different pipe diameter. With MCMC estimation, the smaller the pipe diameter, the flatter the LCC curve tends to be. Optimal pipe renewal decisions regarding pipes with a diameter of 63 mm, 90 mm, and 110 mm can be more relaxed compared to pipes with a diameter of 150 mm. Overall, the LCC curves of MCMC estimation is flatter than ML estimation at the same diameter. The ML estimation does not give the decision makers much choice in modifying the renewal interval. Figure 12 demonstrates that the LCC curves of ML estimation is more "pessimistic" than the MCMC estimation. This is represented by the uneven value of the costs around point t* on the LCC curves, as decision makers replace the pipe in a shorter time to prevent the possibility of more significant losses. important than other factors, such as failure history and year of construction. In this case, the equations generated by both estimation methods are more positively affected by the age of the pipe. However, at the initial age of the pipe, the diameter still has a large influence, and then at a later age, the effect of diameter is less significant in terms of the number of failures. This affects the optimum pipe renewal time. In conditions where the diameter parameter has a more significant effect or is at least the same as the age parameter, it will give a longer renewal time for a larger diameter. In the LCC curves of the MCMC estimation, the tendency for longer renewal times for larger diameters is relatively invisible, and the renewal time is the same for all diameters. Different results are shown by the LCC curves of the ML estimation, where the LCC curves tend to be more uneven, and the renewal time for 110 mm and 150 mm is shorter than for 63 mm and 90 mm.

Conclusions
This article aims to determine pipe failure prediction to optimize WDN renewal. This research methodology investigates the most appropriate parameters for predicting pipe failure for optimization. The non-homogeneous Poisson process (NHPP) with Bayesian inference is used to predict pipe failure numbers. In the following explanation, Bayesian inference is compared with frequentist inference. The Markov chain Monte Carlo (MCMC) approach is presented for Bayesian inference, while maximum likelihood (ML) is used for frequentist inference. We investigate how failure prediction can provide a costeffective pipe renewal strategy and determine the most economical renewal lifetime. The model calculates the most cost-effective life-cycle costs based on the NPV value of renewal costs, repair costs, and predicted failure.
The counting process is applied to analyze the number of failures in an interval of failure time. Based on the homogeneity examination, it can be determined that the failure intensity is inhomogeneous. From the two equations generated from the MCMC and ML estimation, it can be seen that although the parameters have different values, they tend to have the same relationship with the pipe failures. Pipe diameter and pipe failure have a negative relationship, whereas pipe age has a positive relationship. This result is consistent with earlier pipe failure analysis findings by several researchers. The failure prediction model tends to have a middle rating of the peak number of failures indicated by the observation data. On the other hand, we discover that the number of failures between the MCMC and ML estimation is not significantly different, despite the fact that MCMC has a higher total number of failures, which is closer to the observed data. The total number of predicted and observed failures are comparable, and it is exciting to observe the model's performance in the event of failure. There are some apparent errors among the observed values and those predicted by MCMC and ML. Related to this, the MCMC predictions are better than the ML predictions because it MCMC has a lower mean squared error (MSE) value.
This article also proposes visual assessment techniques for evaluating the model concerning the resulting errors. In this evaluation, cumulative and annual charts are practical In the LCC analysis, there is a clear trade-off between two constraints: increasing the limitation on the renewal time may reduce the budget limit, and setting a time restriction for renewal impacts on the ability to adhere to budgetary constraints. It is important to determine how different budgets and time constraints will impact the target functions. The combination of the highest budget constraint and the lowest renewal limit leads to the lowest-cost scheduling plan with the lowest LCC [32].
Regarding diameter, Figure 12 shows that the pipe failure rate increases as the diameter of the pipe increases, although the diameter has a negative effect on the number of failures, referring to the model equations produced by the two estimation methods. However, Asnaashari et al. [54] and Harvey et al. [55] discovered that pipe diameter is less important than other factors, such as failure history and year of construction. In this case, the equations generated by both estimation methods are more positively affected by the age of the pipe. However, at the initial age of the pipe, the diameter still has a large influence, and then at a later age, the effect of diameter is less significant in terms of the number of failures. This affects the optimum pipe renewal time. In conditions where the diameter parameter has a more significant effect or is at least the same as the age parameter, it will give a longer renewal time for a larger diameter. In the LCC curves of the MCMC estimation, the tendency for longer renewal times for larger diameters is relatively invisible, and the renewal time is the same for all diameters. Different results are shown by the LCC curves of the ML estimation, where the LCC curves tend to be more uneven, and the renewal time for 110 mm and 150 mm is shorter than for 63 mm and 90 mm.

Conclusions
This article aims to determine pipe failure prediction to optimize WDN renewal. This research methodology investigates the most appropriate parameters for predicting pipe failure for optimization. The non-homogeneous Poisson process (NHPP) with Bayesian inference is used to predict pipe failure numbers. In the following explanation, Bayesian inference is compared with frequentist inference. The Markov chain Monte Carlo (MCMC) approach is presented for Bayesian inference, while maximum likelihood (ML) is used for frequentist inference. We investigate how failure prediction can provide a cost-effective pipe renewal strategy and determine the most economical renewal lifetime. The model calculates the most cost-effective life-cycle costs based on the NPV value of renewal costs, repair costs, and predicted failure.
The counting process is applied to analyze the number of failures in an interval of failure time. Based on the homogeneity examination, it can be determined that the failure intensity is inhomogeneous. From the two equations generated from the MCMC and ML estimation, it can be seen that although the parameters have different values, they tend to have the same relationship with the pipe failures. Pipe diameter and pipe failure have a negative relationship, whereas pipe age has a positive relationship. This result is consistent with earlier pipe failure analysis findings by several researchers. The failure prediction model tends to have a middle rating of the peak number of failures indicated by the observation data. On the other hand, we discover that the number of failures between the MCMC and ML estimation is not significantly different, despite the fact that MCMC has a higher total number of failures, which is closer to the observed data. The total number of predicted and observed failures are comparable, and it is exciting to observe the model's performance in the event of failure. There are some apparent errors among the observed values and those predicted by MCMC and ML. Related to this, the MCMC predictions are better than the ML predictions because it MCMC has a lower mean squared error (MSE) value.
This article also proposes visual assessment techniques for evaluating the model concerning the resulting errors. In this evaluation, cumulative and annual charts are practical visual assessment techniques that can be used to compare model outcomes to actual failures visually. From pair plot validation, both models produce predictive values that are not much different from the observed values, which can be seen from the distribution of the predicted values around the Y=X line. The table demonstrates that the model is relatively appropriate at predicting failure in each quartile. There is a tendency for the predicted failures to be consistent with the observed failures. The pair plot shows that most of the failures occurred in the first quartile, and fewer occurred in the second. It is critical to highlight that the failure rate examines the group of pipes rather than the history of pipe failure for each pipe. The group of pipes in the first quartile has the highest priority for renewal. From the previous research, prediction models developed with unbalanced data sets will have a greater tendency to anticipate the number of errors, which can reduce the prediction accuracy. The solution to this problem is to extend the length of the data series by accumulating errors over a longer period of time. Based on the analysis, the recommended model is appropriate for predicting failure with serial data from ten years. Pipe failure predictions should be more accurate in situations with more than ten years of data.
The LCC analysis of the MCMC estimation has shown some points; although the initial cost (CI) is different for each diameter, the trend decreases and flattens as the pipe age approaches 50 years. At the age of the pipe above 30 years, the running cost (CR) takes over the total cost, depending on the pipe failure rate from the pipe failure analysis results. The higher the pipe failure rate, the more the point of intersection of the CI and CR curves will shift to the left. This condition is similar to the optimal value of LCC, where the higher the failure rate, the more the optimal point will shift to a shorter age. Based on the lowest LCC curve value for each diameter, the optimal pipe renewal time is obtained at the age of 35 years. The LCC analysis of the ML estimation shows that the lowest LCC value is in the pipe age range of 25 to 35 years. At diameters of 63 mm and 90 mm, the lowest LCC is at the age of 35 years. When the pipe diameter is 110 mm, the lowest LCC is at the age of 30 years, and it increases at the pipe diameter of 150 mm, where the lowest LCC is in the age range of 25 years. As the diameter increases, the age of the pipe decreases. The LCC analysis shows a different trend to the MCMC parameter estimation results. However, the ML estimation results for pipe diameters of 63 mm and 90 mm show the optimum pipe lifetime are relatively identical to the MCMC estimation results. If we look at the value of the LCC in the same pipe age period, the ML estimation results show a higher value, and this difference will be more significant as the time variable increases, even though the analysis in both models is conducted with the same value of cost components. The difference, in this case, is the pipe failure rate from the estimation results. The MCMC estimation results generally produce a lower LCC and a slightly longer optimum pipe life than the ML estimation. This will affect the decision making regarding pipe renewal in WDNs.