1. Introduction
Water distribution networks inevitably undergo a degradation process, leading to the occurrence of multiple anomalies, such as leaks, bursts, blockages, entrapped air, unsealed valves and water contamination [
1]. Numerous utilities are challenged by the urgent need to replace their most degraded assets to meet existing and future requirements. This aspect is more significant in developed countries, where infrastructures were built decades ago and, often, little investment has been made in their renovation [
2]. The assessment of the physical condition of water pipes is an important process of infrastructure asset management (IAM), which corresponds to a set of processes that can be applied by water utilities to balance the three dimensions of analysis in the long-term—cost, risk and performance [
3]. An efficient IAM aims to provide an asset management plan, allowing for the minimization of the life-cycle costs by lowering renewal costs, identifying the necessary measures for optimized resource allocation and prioritizing the necessary interventions [
4].
A comprehensive methodology for IAM to be applied at the three decision levels—strategic, tactical and operational—for asset management is needed [
5]. Condition assessment is integrated into IAM in the process of asset performance evaluation. Different methods can be considered for the condition assessment of water distribution pipes, such as direct or indirect methods [
6]. Direct condition assessment methods require the inspection of the pipe, which can be intrusive (i.e., requiring access to the interior of the pipe with or without interruption of the supply service) and non-intrusive methods (i.e., not requiring access to the interior of the pipe, but often requiring excavation) [
7]. Indirect condition methods include the collection and analysis of the assets’ physical characteristics (e.g., pipe material, diameter, installation date), failure history (e.g., number of busts per year, burst location) and the previous inspection information to estimate the current physical condition [
8,
9]. These methods can be classified in two categories—environmental (i.e., involving the surveying of soil and water chemistry) and operational (i.e., involving leak/burst history, pipe material embedment, coating or linings calculation)—and aim to analyze trends and changes in the system to provide information on the condition of assets [
10,
11]. The main advantage of indirect condition assessment methods compared to direct methods is that inspection is not necessary in most cases. As a result, the interruption of the service is avoided, as is the associated cost [
11].
The development of indirect condition assessment methods is often based on the known relationships between deterioration and the factors that affect the physical condition of buried pipes. These factors can be divided as follows [
12]: (i) physical factors, which are factors related to intrinsic pipe properties, such as pipe material, age, nominal diameter, wall thickness, pipe lining and coating, type of joints and manufacturing process; (ii) environmental factors that are related to the environment in which the pipe is located, such as soil pH, groundwater level, pipe location in relation to traffic, seismic activity, stray electric currents, trench backfill material, pipe bedding, installation practices and underground disturbances; and (iii) operational factors that are associated with the operating conditions of the pipe, such as water quality, water pressure, backflow potential, flow velocity and operational and maintenance (O and M) practices. A burst in a water distribution pipe, if not caused by third parties, can be attributed to pipe deterioration and interpreted as a sign of decreased physical condition. This occurrence can be due to a single influencing factor or a combination. While the factors that contribute to the deterioration of water distribution pipes are known, the relative influence of each factor is widely discussed [
13]. As a result, even though research has proven the existence of failure mechanisms, the uncertainty associated with the exact relationship between factors is still the focus of several studies.
Several algorithms have been developed to assess and predict the physical condition of water distribution pipes. These vary from basic techniques that rely on limited data to highly precise prediction models leveraging the latest advancements in machine learning. The use of key performance indicators to assess the asset condition is one of the most used approaches in Portugal, especially the use of the Infrastructure Value Index (IVI). This index was firstly proposed by [
14] and is defined as the ratio between the current value of the infrastructure and its replacement cost:
in which
IVI (
t) is the infrastructure value index at time
t,
rci,t is the replacement cost of asset
i at time
t,
ruli,t is the residual useful life of asset
i at time
t,
euli is the expected useful life of asset
i and
ni is the total number of assets. This index aims at assisting long-term planning of urban water assets; however, it can also represent the current age and the intervention needs of a given infrastructure. When calculating IVI at the asset level, the key performance indicator represents the ratio of useful life (RUL) of each pipe, corresponding to the ratio between the asset age and its useful life.
Other algorithms include deterministic, heuristic or machine learning techniques. Deterministic techniques are commonly used when a non-random relationship between variables is considered. A well-known study that has developed deterministic algorithms is [
15]. This study estimated the remaining useful life for gray cast iron water pipes, using a total of 29 parameters. These algorithms are easy to apply and understand; however, they are inappropriate in scenarios where the risk of an event is a relevant criterion for decision-making, e.g., seismic risk and vulnerability studies [
16]. Heuristic techniques are more common when the influencing factors are not well understood, allowing for the development of algorithms through subjective opinions from experienced field engineers and experts [
17]. Ref. [
18] proposed a pipe condition rank method using fuzzy PROMETHEE II. This model was applied to eight pipe samples and the results ranked each pipe by the highest to lowest breakage risk. These algorithms are developed with limited or no pipe data; however, results are highly influenced by the opinion of the expert groups. Finally, new algorithms that rely on machine learning techniques have been proposed, including generalized linear models, deep learning, decision trees, random forests, XGBoost, AdaBoost and support vector machines [
19]. Ref. [
20] studied different machine learning algorithms, such as random forests, decision trees, neural networks and support vector machines to accurately locate water leaks in water distribution systems.
These algorithms are very challenging to implement in water distribution systems, given the limited infrastructure data. In Portugal, 51 out of 136 water utilities do not have any form of formal inventory of assets, including information related to pipe location and historical data, and only 20 water utilities are reported to have any information regarding the condition of the assets [
21]. Overall, there is an urgent need to address the gap between the condition assessment algorithms and the existing required resources in water utilities, namely, financial, human and technological. Thus, the main objective of this research is to develop a methodology to assess the physical condition of water distribution pipes based on three different algorithms and without the need for visual inspection or service interruption. The proposed methodology takes into consideration the knowledge gaps that Portuguese water utilities face, regarding the low infrastructure knowledge and the limitations of the current method used for indirect condition assessment (i.e., IVI). Three different algorithms are developed, compared and applied to assess the pipe physical condition of a Portuguese water distribution network, including heuristic, linear regression and support vector regression algorithms.
The paper is organized in five sections, including the current one.
Section 2 presents the proposed methodology to assess the physical condition of water distribution pipes, describing the three algorithms.
Section 3 presents the case study to which the methodology is applied and
Section 4 shows, compares and discusses the condition assessment results. Lastly,
Section 5 presents the main conclusions of this research and highlights foreseen future works.
2. Proposed Methodology
2.1. General Framework
The proposed methodology for assessing the physical condition of water distribution pipes is a three-step procedure:
- (1)
Data collection and validation;
- (2)
Algorithm development;
- (3)
Condition assessment.
This methodology allows water utilities to efficiently assess the physical condition of their water distribution networks without the need for inspection or extensive field survey, which represents a less resource-consuming process. The application of this methodology also aims to contribute to the improvement of resource allocation and investment planning, resulting in the enhancement of the physical sustainability and integrity of water distribution networks.
The first step of the proposed methodology aims at collecting the factors that influence the physical condition of water pipes, which can be divided into three main categories: physical, operational and environmental. The collection of the failure history is also required to assess the physical condition of water pipes without inspection, including the type and date of bursts and their location. All collected data needs to be validated and checked for any inconsistencies (e.g., incorrect installation year of water pipes or inappropriate material choice for the nominal diameter). The most relevant factors that influence pipe deterioration are identified in the literature and summarized in
Table 1.
The second step of the proposed methodology is the development of different algorithms to assess the physical condition of water pipes, e.g., heuristic, linear regression and support vector regression. Heuristic algorithms rely on a theoretical framework for model construction, aiming at applying intuition to problem-solving and reducing the resources needed to achieve an adequate result. On the other hand, linear regression and support vector regression algorithms receive and analyze input data to predict output values within an acceptable range. The study of these two different algorithms allows us to identify their advantages and limitations and to compare the condition assessment results to ensure informed decision-making. A detailed description of each algorithm is presented in the following sections.
The third step of the proposed methodology aims at developing a comprehensive condition rating scale to interpret the algorithms’ results and to assess the physical condition of water distribution pipes. A condition matrix with three condition rating levels, namely good (represented by the green colour), average (represented by the yellow colour) and unsatisfactory (represented by the red colour), is developed.
Different condition matrices are developed for combining the algorithm results with the performance indicator of the ratio between the asset age and the useful life, designated as the Ratio of Useful Life (RUL). The idea behind the construction of these condition matrices is to combine a well-known performance indicator with data related to observations (e.g., pipe burst) made during the operation of the network. The RUL scale varies between zero and one, one being the value associated with newly installed pipes and zero being the value associated with pipes that have already reached their reference useful life and still remain in service. Three ranking levels equally divided are considered for the RUL scale.
The scale adopted for the algorithm results varies for each technique. In the heuristic algorithm, the proposed scale is similar to RUL with three ranking levels equally divided according to the interval of variation of the pipe condition model. The proposed condition matrix for the heuristic algorithm is presented in
Figure 1a. For the remaining two algorithms (i.e., linear regression and support vector regression), the scale is based on the number of bursts during the observation period. This scale could vary with acceptable condition thresholds considered for each water utility, which in most cases depends on the available annual budget for interventions.
Figure 1b illustrates a condition matrix for the two remaining algorithms, considering that zero bursts per year during the observation period (OP) are associated with pipes in good condition, with one burst for pipes in average condition and two bursts or more for pipes in unsatisfactory condition.
2.2. Heuristic Algorithm
The heuristic algorithm aims at developing condition models using weights for each factor that influences pipe deterioration calculated through a survey of experts in this field. The heuristic algorithm is composed of three main steps:
Development of a survey on the influence of internal and external factors on pipe deterioration.
Processing and analysis of the survey data and calculation of weighting factors.
Development of condition models.
The first step aims at developing a survey to assess the importance of different factors on the physical condition (translated by the deterioration) of water distribution pipes, considering the following categories: (i) pipe characteristics and operating conditions; and (ii) external factors. The survey should be short, clear and unambiguous and the structure chosen should facilitate data collection. Experts should only answer questions regarding factors they are familiar with, avoiding the collection of guessing data. Additionally, the number of years of experience that experts have in the field is also collected to assess the perception that each experience group has of the influencing factors. In the developed survey, the influencing factors are evaluated on a scale from zero to five, in which zero means that the factor is irrelevant to pipe deterioration and five means that the factor is very relevant.
The second step aims at processing and analyzing the collected data from the surveys and calculating the weights that each factor has on physical condition. To calculate these weights, weighting methods are applied to assess different aspects of the collected data, namely the discrepancies in the average value (i.e., the use of the average and median values) or the overall confidence of experts in their perceptions (i.e., the bias of the number of replies for each factor). Furthermore, it should be noted that weights should range from zero to one and the total weight sum should be one.
The final step aims at developing the condition models with the calculated weighting factors. The construction of models should take into consideration the positive or negative relationship of each weighting factor with the physical condition of water distribution pipes. This means that special attention must be given to the variation of the influencing factor and their contribution to the condition grade, through the knowledge gathered by the literature review.
2.3. Linear Regression Algorithm
Linear regression assumes that pipe deterioration has a non-random relationship with network characteristics that yield pipe failure. These failures can only be detected by water utilities during a pipe inspection or when a burst occurs, which is the most common type of failure history, as direct action needs to be taken when detected. As a result, this algorithm uses failure history as a dependent variable and aims to construct models that estimate this theoretical failure considering different independent variables related to physical, operational and environmental factors. The steps that are needed to complete this algorithm are the following:
Development of a correlation matrix between network characteristics (i.e., independent variables) and failure variables (i.e., dependent variables).
Construction and assessment of linear regression models.
Validation of regression models.
The first step aims at developing a correlation matrix with the dependent and independent variables. The independent variables are the characteristics of water distribution pipes (e.g., year of installation, material, nominal diameter) and the dependent variables are the failure history. Preferably, the collected variables should be quantitative; this operation might require the conversion of categorical variables to numerical values. The construction of the correlation matrix allows us to identify the highest-correlation coefficients that best describe the relationship between dependent and independent variables. The Spearman’s rank correlation coefficient is used to construct the correlation matrix, as it provides a nonparametric technique and measures the strength and direction of a non-monotonic relationship [
22].
The second step consists of the construction of simple and multiple linear regression models and the assessment of the developed regression models through statistical analysis. Statistical analysis includes the calculation of the coefficient of determination for both simple (r-squared) and multiple regression analysis (adjusted r-squared) as well as the statistical significance of linear models by null hypothesis testing. The multicollinearity between explanatory variables (i.e., independent variables) is verified by the mathematical measurement of the variance inflation factor (VIF). Values higher than or equal to 10 are considered to have a high degree of multicollinearity and, therefore, are excluded from the models [
23].
The third step corresponds to the validation of linear models by verifying the statistical dispersion of the collected data through the evaluation of the root mean square error, relative error and absolute error. The collected data must be divided into two subsets to carry out the validation: the training subset used for model construction, typically representing 80% of the whole dataset, and the validation subset used to validate the obtained results, representing the remaining 20% of the dataset. The goal of the validation subset is to perform cross-validation of the evaluated models and flag problems, such as overfitting and selection bias [
24].
2.4. Support Vector Regression Algorithm
This subsection explores support vector regression (SVR) as an alternative to linear regression for predicting pipe failures. Unlike linear regression, which is based on a linear relationship between variables [
25], SVR can effectively model non-linear and complex relationships [
26] between network characteristics and failure occurrences. This characteristic makes SVR a potentially more suitable approach for scenarios where the deterioration process exhibits non-linear behavior, such as in aging pipe networks.
The objective of SVR is to find a hyperplane in the high-dimensional feature space that maximizes the margin between the hyperplane and the closest data points, also called support vectors, from each failure class:
in which
is the weight vector,
φ(
x) is the transformed input vector in the feature space and
b is the bias term.
The process of finding the optimal hyperplane can be formulated as a convex optimization problem that minimizes the error of approximation. This minimization is also described as the ε-insensitive loss across all data points [
26]. The ε-insensitive loss function penalizes deviations from the desired behavior and the model prediction for a certain input. Solving this optimization problem using appropriate algorithms allows us to obtain the optimal weight vector (
w) and bias term (
b) that define the decision function for classifying unseen data points. Prediction for a new data point involves projecting it into the high-dimensional feature space using the chosen kernel function and calculating the distance from its projection to the established hyperplane.
The SVR algorithm is applied to estimate the number of failures for each pipe by using the same network characteristics, as the ones used in linear regression. The classification of each pipe based on the number of failures is evaluated by using a confusion matrix, comparing the real number of failures with the estimated one. Furthermore, to compare the results with the linear regression algorithms, the determination coefficient is calculated.
3. Case Study
A water distribution network with a total length of approximately 85 km and serving 1787 houses with an annual water volume of 1,805,024 m3 is studied. This distribution system includes three pumping stations and four water storage tanks. The network is located in low-density residential housing and the domestic consumers correspond, mainly, to detached houses, which are mostly used in the summer season. Thus, this utility is characterized by seasonal water consumption with high outdoor water use in the summer and four to five times lower water consumption during the winter.
A total of 1666 distribution pipes comprise the network with different materials, such as polyvinyl chloride (PVC), asbestos cement (AC), ductile iron (DI), steel and high-density polyethylene (HDPE). The most predominant pipe material is PVC, corresponding to 53% of the total network, followed by AC (27%) and DI (17%) (
Figure 2a). The nominal diameter of distribution pipes varies between 50 and 600 mm, with the most common pipes, those with small nominal diameters (i.e., less than 250 mm), representing more than 75% of the total distribution pipes (
Figure 2b). The distribution network was initially constructed in 1972; however, the largest expansion of the network took place between 1982 and 1992, in which 50% of the pipes were installed (
Figure 2c).
The studied distribution network presented 113 bursts during the observation period between February 2014 and February 2020. From a total of 113 bursts, 41 were classified as third-party interventions, while the remaining 72 bursts can be attributed to the natural process of pipe deterioration (2 of these bursts were not considered for further analysis due to being located in service connections). A total of 70 bursts, located in 49 different pipes, are considered for the present study.
5. Discussion
Results have shown that the linear regression algorithm tends to overestimate the pipe condition rating, classifying pipes with worse conditions than the heuristic and SVR algorithms. The application of the three algorithms to the described case study resulted in different overall classifications for the same pipe. While the heuristic algorithm classified 60% of the pipes as average condition and 18% as unsatisfactory condition (considering answer-based weight and the entire survey group), the linear regression algorithm evaluated approximately 60% of the pipes as unsatisfactory condition (considering the linear model with the independent variables of average pipe age when the burst occurred and static pressure). Finally, the SVR algorithm classified approximately 60% of the pipes as average condition and almost all the remaining pipes as good condition.
The assessment of the physical condition of water distribution pipes combining the well-known performance indicator of the ratio of useful life (RUL) and the results of the three developed algorithms is presented in
Figure 8. Results have demonstrated the following: the ratio of useful life tends to distribute pipe classification more equally in the three classes; the heuristic algorithm classifies most pipes as average condition; and the linear regression algorithm classifies most pipes as unsatisfactory condition. The SVR algorithm stands out as the main classifier for identifying pipes in good condition when compared to other algorithms. Additionally, it indicates that most pipes are classified in average condition and it only assigns three pipes to the unsatisfactory condition category.
Figure 9 depicts the condition assessment maps obtained by the abovementioned three algorithms and the RUL. The pipes classified in unsatisfactory conditions by the ratio of useful life and the linear regression algorithm are almost the same, though the SVR algorithm is less conservative with a higher number of pipes in good and average condition.
Although different results were obtained, the development of a rehabilitation plan for a water distribution network should ideally integrate the results of these three algorithms to priority assets for intervention, i.e., the pipes classified as unsatisfactory and average condition in all algorithms. The cross-referencing of these three algorithms is described in
Table 11.
The use of different algorithms for pipe condition assessment within water distribution systems has significantly improved the existing methods (i.e., RUL), since these allow for the inclusion of several factors that affect pipe deterioration as well as the use of advanced statistical tools and machine learning algorithms. The studied algorithms should be incorporated in the infrastructure asset management of urban water assets contributing to more informed and robust decision-making processes.
6. Conclusions
The prioritization of water distribution pipes for rehabilitation requires the assessment of the physical condition of assets. A methodology is proposed, including the following steps: data collection and validation, algorithm development and condition assessment. Three algorithms were studied and compared: the heuristic, linear regression and support vector regression algorithms. The proposed methodology allows for the assessment of the physical condition of assets without the need for extensive fieldwork or specific computational know-how, since it is based on the knowledge of simple infrastructure and operational data (e.g., pipe characteristics and operating conditions).
The application of the proposed methodology allowed for the assessment of the physical condition of a water distribution network located in Portugal. The results obtained from each algorithm were compared with a well-known performance indicator, the ratio of useful life. The results presented significant differences in their overall pipe condition classification. The linear regression algorithm allowed us to develop multiple linear regression models, considering different variables, such as the average pipe age when the burst occurred, the static pressure, the pipe material, the pipe length, the installation year and the distance to the tank. These models use the pipes with bursts during the period of observation as a training dataset. Consequently, the developed models include the notion that the structural condition of all other pipes will resemble the pipes that are presumably in worse condition (i.e., pipes that exhibited bursts during the period of observation). Therefore, this algorithm is considered to be highly penalizing in the classification of network pipes, classifying almost 60% of the network with an unsatisfactory condition.
On the other hand, the results from the heuristic and SVR algorithms were much less conservative, classifying the pipes as predominantly in average and good condition. In the heuristic algorithm, the following factors were considered: pipe material, pipe age, pipe nominal diameter, pipe length, pipe roughness coefficient and average operating pressure, pipe age being the most relevant factor in pipe deterioration. Different quantitative weights for each influencing factor were used; however, their order of influence does not change with the weights. In the SVR algorithm, different variables were used, such as the average pipe age when the burst occurred, the static pressure, the pipe material, the pipe length and the nominal diameter. The determination coefficients for the training and validation data were 0.94 and 0.64, respectively. These results indicate better performance from the SVR algorithm compared to linear regression.
Nevertheless, these three algorithms can be considered an improvement on the existing method of condition assessment through the use of the performance indicator ratio of useful life (RUL). This improvement is justified by the incorporation of several factors that influence pipe deterioration besides the asset age and its respective useful life. The application of the proposed methodology to a larger dataset with a higher period of observation should be carried out as future work. This would allow for the achievement of more robust results, as well as testing of the proposed methodology. Furthermore, the obtained results should be included in the decision-making process for pipe rehabilitation.