Case Study for Predicting Failures in Water Supply Networks Using Neural Networks

Medeiros, Viviano de Sousa; dos Santos, Moisés Dantas; Brito, Alisson Vasconcelos

doi:10.3390/w16101455

Open AccessArticle

Case Study for Predicting Failures in Water Supply Networks Using Neural Networks

by

Viviano de Sousa Medeiros

¹,

Moisés Dantas dos Santos

^2,* and

Alisson Vasconcelos Brito

³

¹

TRIL Lab, Graduate Program in Mechanical Engineering, Center of Technology, Federal University of Paraíba, João Pessoa 58058-600, Brazil

²

TRIL Lab, Scientific Computing Department, Center of Informatics, Federal University of Paraíba, João Pessoa 58058-600, Brazil

³

LASER Lab, Scientific Computing Department, Center of Informatics, Federal University of Paraíba, João Pessoa 58058-600, Brazil

^*

Author to whom correspondence should be addressed.

Water 2024, 16(10), 1455; https://doi.org/10.3390/w16101455

Submission received: 23 March 2024 / Revised: 1 May 2024 / Accepted: 9 May 2024 / Published: 20 May 2024

(This article belongs to the Special Issue Water Supply System Reliability, Safety and Risk Modelling & Assessment, Volume II)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study deals with the prediction of recurring failures in water supply networks, a complex and costly task, but essential for the effective maintenance of these vital infrastructures. Using historical failure data provided by Companhia de Água e Esgotos da Paraíba (CAGEPA), the research focuses on predicting the time until the next failure at specific points in the network. The authors divided the failures into two categories: Occurrences of New Faults (ONFs) and Recurrences of Faults (RFs). To perform the predictions, they used predictive models based on machine learning, more specifically on MLP (Multi-Layer Perceptron) neural networks. The investigation unveiled that through the analysis of historical failure data and the consideration of variables including altitude, number of failures on the same street, and days between failures, it is possible to achieve an accuracy greater than 80% in predicting failures within a 90-day interval. This demonstrates the feasibility of using fault history to predict future water supply outages with significant accuracy. These forecasts allow water utilities to plan and optimize their maintenance, minimizing inconvenience and losses. The article contributes significantly to the field of water infrastructure management by proposing the applicability of a data-driven approach in diverse urban settings and across various types of infrastructure networks, including those pertaining to energy or communication. These conclusions underscore the paramount importance of systematic data collection and analysis in both averting failures and optimizing the allocation of resources within water utilities.

Keywords:

fault prediction; water supply networks; machine learning; predictive modeling; infrastructure management

1. Introduction

Water distribution networks are essential infrastructures for the development of a city and its productive activities [1]. Failures in these networks can bring inconvenience and losses to the population and economic activities, industry, and agriculture [2,3]. In this context, predicting future failures or breaks in a water supply network allows the management of this network to carry out planned interventions, anticipating failures, which can reduce inconvenience and losses caused by water supply interruption [4,5,6,7].

According to information from the National Sanitation Information System (in Portuguese: SNIS—Sistema Nacional de Informações sobre Saneamento), Brazil has a Water Distribution Loss Index (WDLI) of 40.3%. This number indicates that more than 40% of treated water is lost in the distribution process. In this context, the Water and Sewage Company of Paraíba (in Portuguese: CAGEPA—Companhia de Água e Esgotos da Paraíba) presents a WDLI of 35.4%, a value below the national average, but expressing a concerning figure: more than one-third of the water produced by the company is lost in the distribution process [8].

Predicting future failures in water supply networks constitutes a complex and computationally intensive endeavor, as well involves processing data with high volume and dimensionality [9]. In this context, three approaches stand out for predicting failures in this type of network: predictions based on physical models of the pipeline, predictions based on statistical models, and predictions based on machine learning models. Of these three approaches, the use of machine learning has gained garnered significance due to its capacity for automated pattern recognition and complex relationships among the variables linked to a water supply network [6,10].

Machine learning-based prediction models have been employed by water supply companies to forecast failures in their distribution system. However, this approach requires high-dimensional and highly representative data. Such data may be scarce in terms of quantity and variety (diversity of variables) because they are highly sensitive for companies, and access to them may be limited. Data on network structure, historical failure records, and spatial and meteorological data, for example, are private to water supply companies and are considered strategic for their business models, which complicates access [9,11].

According to [1,10], there are few studies involving failure prediction and optimization of corrections in water supply infrastructures. This type of operation is common in the energy sector, but few applications are known for water supply. Moreover, ref. [3] also considers statistical results from research evaluating the condition or reliability of water distribution systems to be scarce. Additionally, according to [12], research about water distribution system data using machine learning techniques is scarce or rare.

Amidst this scenario, the present research aims to predict the occurrence of failures in a water distribution network using a machine learning-based model and real network data, including the history of failures. This, combined with other variables, allows estimating how many days will elapse until a new failure occurs at a point in the system where previous failures have been recorded.

The present research work is organized as follows: first, we begin with a brief introduction to the topic, followed by a detailed presentation of the proposal and theoretical foundation. Next, this work present the methodology and the development of the research. Finally, the results achieved and their implications are discussed.

2. Related Works

Failures in water supply networks can occur due to the influence of various factors, as described in the work of [13], which provides a detailed description of the main factors influencing failure mechanisms in that systems. Along the same lines, the study presented by [14] also considers elements that play a determinant role in network breakdowns, taking into account information on physical, mechanical, environmental, and social components. Both works highlight the influence of failure history as an important factor for predicting new failures. This information forms the basis of our research, which considers failure history as our main predictive variable.

In the literature, some studies explore the prediction of failures within water supply networks employing various methodologies, yet the predominant approach involves leveraging failure history as a significant predictive variable. Some of these works employ predictive models based on machine learning to forecast the remaining useful life for a single pipeline in the network, as seen in [15,16]. Additionally, the authors of [17] uses statistical relationships between failure frequencies and weather conditions to assess the effect of climate change on future breakdowns in water supply networks.

On the other hand, this research also found other works that aim to predict the probability of a failure occurrence. In this line of research, is possible observe that the work of [18], which uses an adapted model from electrical networks to predict the risk of failure in water supply networks. Similarly, the work of [19] utilizes a predictive model to obtain a failure probability associated with each sample, i.e., each pipeline in the network. Based on the predicted information, the replacement of parts of the water supply network is planned.

The work of [9] aims to predict the risk of failures using AutoML based on the failure history. Similarly, the work [7] seeks to predict the probability of network failure. Finally, the study by [20] aims to predict the frequency of failure (failure per kilometer of the network), but it uses limited data in its model, restricting itself to data such as material, length, diameter, and installation year.

Other works deal with predicting the failure rate in water distribution mains (networks that typically distribute water between two different points with long length and diameter). This is the case of the study presented by [21]. A similar approach is taken by [10,22], although the latter apply their work to predicting failures in long-length water mains.

There are also studies that deal with the analysis of failure risk using hydraulic models of real supply networks. For example, the work by [23,24] employs models built on EPANET to analyze failure risk. Both studies use real data from Polish cities in their analysis. However, while the first study analyzes the consequences of failures occurring in individual pipes and classifies areas according to their vulnerability to failures caused by pressure variations, considering statistical factors such as seasonality, the second study examines failure risk indicators and proposes the adoption of new failure indicators for pipelines. Both studies deal with failure risk indicators based on network structure data, which were not available for the research presented in this manuscript. However, the present research has a different objective (predicting the number of days until the next failure) and uses different data (mostly historical failure data).

Unlike what is found in the literature, our work analyzes points in an urban network where there is already a history of past failures, and our prediction consists of a regression task in which the returned value indicates the number of days until the occurrence of the next failure at a particular point.

3. Problem Definition

The present work purposes to predict failures in a water systems based on its historical records of previous failures. A failure is understood as any occurrence recorded in the water supply network related to leaks, indicating a potential risk of temporary water shortage (water outage).

Accordingly, this research analyzes points in the water network where there is already a history of failures, meaning points in the network where failures are recurrent. These points were mapped based on the historical records of registered complaints, and the data from these points were subjected to a machine learning algorithm, a model based on a neural network, aimed at predicting a numerical value indicating the number of days between the last registered failure and the occurrence of a new failure in the future.

Armed with the estimated prediction of the forthcoming failure, the water utility can plan maintenance activities so that this future failure can be repaired preemptively, anticipating the leak, avoiding water shortages, and minimizing inconvenience for the potentially affected population. Therefore, the predictive maintenance work can contribute to creating a positive image for the water utility among its customers.

The data used in this research were provided by the Water Supply Company CAGEPA (Water and Sewage Company of Paraíba). These are real data provided by the company regarding the recorded occurrences. These occurrences include information such as date, geographical coordinates, type, address, and other data.

It is essential to emphasize that this research was conducted through an institutional partnership between CAGEPA and the Federal University of Paraíba. Consequently, this study is tailored to the specific context provided by the company. Hence, the company was actively involved in defining the research objectives and supplying the relevant data for analysis. Consequently, the company’s primary objective is to forecast future failures, necessitating the provision of data for the predictive model.

For this research, data recorded in the city of Guarabira–PB were selected. The choice of this city was motivated by a recommendation from CAGEPA, as this city has the highest availability of data, with a greater number of variables available, as well as the longest data interval, extending from November 2017 to April 2023.

For clarification purposes, it is important to emphasize that failures may recur over time, meaning a failure occurring at a point in the network may be recorded again at that same point in the future. Thus, occurrences recorded within a radius of 50 m from each other are considered a recurrence of the same failure, i.e., a repetition of a failure over time, while occurrences outside this radius are considered new independent failures. A representation of this information can be visualized in Figure 1 below:

Observing Figure 1, it is possible to visualize the occurrences recorded in the city of Guarabira–PB. The figure shows the map of the city and several blue dots representing the location of the failures. On the other hand, is possible see three highlighted points, where it is possible to perceive a central occurrence circled by a radius. This radius indicates whether other occurrences around it are new failures or recurrences of the main failure.

In Figure 1 that Point A is a failure that does not have any recurrences, while Point B has only two recurrences. Finally, Point C, unlike the previous ones, has several recurrences around the main failure. It is worth noting that the main failure is the oldest one, meaning that it was recorded before the others.

Failure History

To enhance understanding of the research problem addressed in this study, Figure 2 below presents an analysis of a failure case and the arrangement of its failure history. Additionally, the figure also presents the forecast to be estimated by the neural network developed in this study.

Analyzing Figure 2, it is possible to visualize a point in the water supply network that experienced a recurrence of eight failures. In the figure, the beginning of the monitoring occurred in November 2017 and the failure history is presented in blocks separated by the failures that occurred at this point. For this specific case, the first failure occurred 157 days after the start of monitoring, the second failure occurred 145 days after the first one, the third failure occurred only 1 day after the second one, and so on, presenting all the failures that occurred at this point in the network.

Finally, on the rightmost part of the figure, a blue block indicates the number of days between the last recorded failure and a new potential failure that may occur in the future. Therefore, this is the prediction objective of this study: to forecast the number of days between the last recorded occurrence and an estimated future failure.

4. Materials and Methods

This research was developed to predict failures in a water supply network using real data from the same network applied to machine learning models. Consequently, the predictive data analysis process presented in [25] was followed. Therefore, this work is structured into six main stages with their respective characteristics and artifacts generated in each one. These stages are described below in Figure 3:

According to Figure 3, the work begins with understanding the business context in which the data are embedded. It is during this stage that the objectives of analysis and prediction are delineated. Next, the data collection and understanding work are carried out, with exploratory analyses aiding in understanding the information and phenomena present in the data. The third stage of the work consists of preparing the data to be used in predictive models. It is in this phase that the processes of data cleaning, variable selection, and data preparation with necessary transformations take place.

The fourth stage of the process involves the final selection of data attributes and the construction of predictive models. The fifth stage, in turn, consists of applying the constructed models to the data to obtain predictions. With the predictions made, it is possible to validate the accuracy of the model. Finally, the last stage consists of using the model with real data and its evolution.

Below, this manuscript describes how each stage described above was applied in the context of our work:

The initial two stages described in Figure 3 were carried out together, and both were developed in an immersion context within the company CAGEPA, the data provider. Understanding the business enabled a better grasp of the phenomena encountered in the exploratory analysis stage, where company experts were consulted to help uncover trends and patterns present in the datasets. After a preliminary exploration of the data, the data cleaning stage was conducted to remove attributes and values that had the potential to impair subsequent analyses.

The attribute selection stage was carried out in several phases. The CAGEPA databases used in this study have dozens of attributes, with the main one used in this work dealing with network failure occurrences and the services performed to carry out the operations of correction and recovery of these failures.

The primary attributes utilized in this study were related to the network failure history of the company, from which the date of occurrence registration, its geographical location, its type (only occurrences of the Leak Removal type were considered), and finally, its registered address were extracted.

In addition to attributes present in the datasets provided by the company, other attributes were incorporated during the development of the work, most of which were statistical values calculated from previously existing data, such as mean, standard deviation, and range, among others described below.

Below are described all the attributes or variables used in this research work. Table 1 and the following enumerated list present each of the variables used. The attributes were divided into two types: predictor attributes and the target attribute, according to the indication of [26].

Target attribute (variable).
1.A.
Number of days until the occurrence of the next failure for a specific point in the network.
Predictor attributes (variables).
2.A.
Distance to the nearest neighbor (meters).
2.B.
Terrain elevation at the location where the failure was registered relative to sea level (meters).
2.C.
Number of recurrences of the failure in question.
2.D.
Number of days since the last failure in the network.
2.E.
Number of days between failures in the network at the same point.
2.F.
Number of occurrences on the same street.
2.G.
Mean of the number of days between registered failures at the same point.
2.H.
Variance in registered failures for the same point.
2.I.
Standard deviation of the number of days between failures for the same point.

The target attribute consists of a numeric value that expresses the number of days until the occurrence of the next failure for a specific point in the network. Furthermore, the data interval used corresponds to all occurrence records in the supply network starting from January 2018 and extending until April 2023.

Conversely, data such as terrain elevation and distances were calculated using the Google Maps API through the Elevation API [27].

The attributes or variables used in this work are listed above. In regard to the attributes or variables used in this study, it is important to analyze the existence of correlations between the variables. The correlation values between the target attribute and each of the predictor attributes are expressed in Table 2 below, which shows the values obtained for Pearson’s Correlation [28] between the variables.

Observing this figure, it is possible to perceive that the variables do not show strong correlation with the target attribute. Of all the analyzed attributes, only the number of recurrences and the number of days between failures show some correlation (0.4); all the others present values below this measure. These values indicate that variation in the values of these variables alone does not explain the behavior of the target attribute. In other words, the number of days until the occurrence of the next failure in the network is not directly explained by the predictor variables. It is important to highlight that the colors expressed in the table indicate the intensity of the correlation: positive (lighter colors) and negative (darker colors).

From the attributes listed above, predictive models based on Multi-Layer Perceptron (MLP) neural networks for the regression problem were used. In this context, this work used manually configured models, meaning that the arrangement of layers, and the number of neurons in each layer, were defined manually. Additionally, for comparison purposes, models with automatic configuration and linear regression were used, which can be used to compare the accuracy of the results.

In this stage, the process of training the predictive model occurred, meaning that the model was configured and trained using labeled data with the correct response that the model should seek to achieve by correcting its training error. In this context, training occurred with a dataset of 1727 failure samples. This dataset was divided into training and testing data, the former consisting of a total of 1175 samples (corresponding to the period from January 2018 to December 2021), while the testing set consisted of 552 samples spaced in time between January 2022 and April 2023.

The models used were obtained from the Scikit-Learn [29] and Keras [30] libraries, where the MLP Regressor and Keras DNN (Deep Neural Network) models were, respectively, utilized. Additionally, for comparison purposes, the same data were subjected to linear regression models provided by both libraries.

The predictive models used had different configurations: while the MLP model built using the Keras DNN library was automatically configured, the same applies to linear regression in both libraries. Conversely, a manually configured model was built using the MLP Regressor library. This model had the following hyperparameter configuration.

Four hidden layers contained, respectively, 128, 128, 648, and 550 neurons in each. The activation function used was ‘relu’ or rectified linear unit function. The training process occurred over 370 iterations and 100 epochs.Below, the convergence graph of the error in the training process is represented in Figure 4.

Some error metrics were used to evaluate the accuracy of the model. The first error metric used was the Mean Absolute Error (MAE), which indicates the difference between the expected value and the value predicted by the model in absolute terms, that is, non-negative values. MAE is calculated using Equation (1), where

\hat{y}

is the value predicted by the model for the n-th sample and y is the expected value or true corresponding value. All other error metrics used in this work were calculated considering the same meanings for these same variables.

MAE (y, \hat{y}) = \frac{1}{n} \sum_{i = 0}^{n - 1} |y_{i} - {\hat{y}}_{i}| .

(1)

In conjunction with MAE, other error metrics such as MSE (Mean Squared Error) and RMSE (Root Mean Squared Error) were also used. They show the error value between the expected value and the predicted value, but MSE has the ability to highlight small dimension error values, while RMSE, in turn, seeks to keep the error value in the same dimension as the target variable. MSE is calculated using Equation (2), while RMSE is calculated using Equation (3).

MSE (y, \hat{y}) = \frac{1}{n} \sum_{i = 0}^{n - 1} {(y_{i} - {\hat{y}}_{i})}^{2} .

(2)

RMSE (y, \hat{y}) = \sqrt{\frac{1}{n} \sum_{i = 0}^{n - 1} {(y_{i} - {\hat{y}}_{i})}^{2}} .

(3)

A fourth error metric used was the Mean Absolute Percentage Error (MAPE). Unlike the previous ones, this metric is not altered by the global scale of the target variable. The best expected value for MAPE is

0.0

. Equation (4) is used to calculate this metric. The fifth metric used was the Median Absolute Error (MedAE), which is considered robust for outlier values. The error value is calculated from the median of all absolute errors between the predicted and expected values. MedAE is calculated from Equation (5), and its result consists of a non-negative value, with the best possible value being

0.0

.

MAPE (y, \hat{y}) = \frac{1}{n} \sum_{i = 0}^{n - 1} \frac{|y_{i} - {\hat{y}}_{i}|}{max (ϵ, |y_{i}|)}

(4)

MedAE (y, \hat{y}) = median (∣ y_{1} - {\hat{y}}_{1} ∣, \dots, ∣ y_{n} - {\hat{y}}_{n} ∣) .

(5)

Finally, the Max. Error, or Maximum Error, was also calculated, capturing the highest error value, i.e., the worst case of error between the predicted and expected values. Max Error is calculated using Equation (6).

Max Error (y, \hat{y}) = max (| y_{i} - {\hat{y}}_{i} |)

(6)

From the models used, the last stage consisted of applying new unknown data to the model and verifying the results. After this stage, the use and evolution of the predictive models used could be subsequently performed.

5. Results and Discussion

The Table 3 shows the error values calculated for each of the algorithms. Within it, one can observe each of the models employed, along with the corresponding error values derived from their respective predictions. It is worth mentioning that the error values used in this work were obtained using the ScikitLearn Metrics library [31].

Observing the data in the table, is possible see that the Manual MLP model obtained the best accuracy values for Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Median Absolute Error (MedAE), while the Maximum Error (MAX. ERROR) value was obtained by the Automatic MLP. However, the values are very close, indicating similar accuracy in both cases.

The error values obtained by the predictive models used in this work are expressed in the table. The error metric column presents each of the metrics used in this work. The errors are presented in days. For example, an MAE of 33.84 for the Manual MLP means that the model is able to predict, on average, a pipeline failure with a margin of error of 33.84 days.

Given the above, the models based on Manual MLP and Automatic MLP obtained the best results. Furthermore, the error values are close in both cases. Despite being close, the performance of the Manual MLP model achieved superior performance, and for this reason, the results obtained by this model are presented below.

To evaluate the error obtained in the predictions of our model, let us observe Figure 5, which presents the histogram of the error for the proposed model.

The Figure 5 shows a histogram of the error obtained in the predictions of the Manual MLP model. The error values indicate the number of days of discrepancy between the predicted value and the expected value. In this graph, the majority of error values are below 40 days, and only five error values above 80 days were recorded.

In order to provide further clarification regarding the accuracy of the model, the obtained error values were analyzed and we found the following:

12.37% of predictions have an error of less than 5 days;
21.64% of predictions have an error of less than 10 days;
26.80% of predictions have an error of less than 15 days;
57.73% of predictions have an error of less than 30 days;
80.41% of predictions have an error of less than 45 days;
87.62% of predictions have an error of less than 60 days;
93.81% of predictions have an error of less than 90 days.

Considering the presented data, over 20% of the predictions have an error of less than 10 days, and 80% of these predictions have an error of less than 45 days. Therefore, it is possible conclude that the predictions made provide relevant values for the decision-making process of the company, as they indicate, with a certain degree of confidence, the forecast for the day when the next break in the network will occur at a specific point.

Based on the contributed results, CAGEPA can utilize the geolocation of the obtained predictions to carry out repairs on the network and preventive and/or predictive maintenance in advance (before the failure occurs), thus avoiding interruptions in the supply and inconvenience to the population. Additionally, they can optimize the mobilization of teams to carry out repairs in advance, including determining when each repair should take place and setting priorities for maintenance work.

Performing repairs after a failure occurrence can lead to significant disruptions and inconvenience for customers. By using predictive maintenance based on the obtained predictions, CAGEPA can proactively address potential issues before they escalate into full-blown failures, minimizing the impact on water supply and avoiding the need for emergency repairs. This proactive approach not only improves service reliability but also reduces operational costs and enhances overall customer satisfaction.

Planning and resource allocation are essential aspects of efficient operations for any utility company like CAGEPA. By proactively addressing potential failures based on predictive maintenance, the company can better manage its resources, optimize workforce scheduling, and ensure timely repairs without causing significant disruptions to water supply. This approach not only safeguards the company’s reputation but also fosters enhanced customer satisfaction by minimizing inconvenience and ensuring the reliable delivery of services. Additionally, the ability to plan ahead and mitigate potential damages can lead to long-term cost savings and operational efficiency for the company.

6. Conclusions

Given the presented results, it is possible conclude that to predict in advance the occurrence of failures in water supply networks using data from the network itself, focusing on the historical record of past failures.

The results show that using the history of failures as input for predictive models allows CAGEPA to predict a significant percentage of failures in its network with considerable accuracy.

As future work, it is important to predict failures in network points that do not have a history of failures. However, the specific instantiation of the predictive model employed, along with the selection of relevant variables, are likely to necessitate differentiation. Additionally, this work can be expanded to be applied in other cities with the same company or in cities with different companies. Moreover, the utilization of flow simulation within an EPANET environment presents a highly intriguing approach worthy of consideration. Although this methodology is currently under consideration for future research endeavors, it is noteworthy that such information has not yet been made available by the company.

The model accuracy can be enhanced by incorporating new variables and utilizing different predictive models that may enrich the results in terms of precision. Incorporating additional variables and expanding the dataset holds promise for enhancing the model’s accuracy. This potential arises from the prospect of leveraging a larger number of records during the model’s training and validation phases. It is pertinent to note, however, that the company has not yet supplied any additional data. Furthermore, as a contribution to science, this work can be extrapolated and applied to other domains beyond water supply networks, such as energy supply networks, oil pipelines, communication networks, and so on.

Author Contributions

Conceptualization, V.d.S.M. and M.D.d.S.; methodology, V.d.S.M. and M.D.d.S.; software, V.d.S.M.; validation, M.D.d.S. and A.V.B.; formal analysis, V.d.S.M.; investigation, V.d.S.M.; resources, M.D.d.S.; data curation, M.D.d.S.; writing—original draft preparation, V.d.S.M.; writing—review and editing, V.d.S.M., M.D.d.S. and A.V.B.; visualization, V.d.S.M., M.D.d.S. and A.V.B.; supervision, V.d.S.M. and A.V.B.; project administration, V.d.S.M. and A.V.B.; funding acquisition, M.D.d.S. and A.V.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Council for Scientific and Technological Development (CNPq).

Data Availability Statement

Data are contained within the article.

Acknowledgments

First of all, I thank God, the Federal University of Paraíba, CAGEPA, my supervising professors, and everyone who collaborated in some way.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chatzigeorgakidis, G.; Karagiorgou, S.; Athanasiou, S.; Skiadopoulos, S. A MapReduce based k-NN joins probabilistic classifier. In Proceedings of the 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, 29 October–1 November 2015; pp. 952–957. [Google Scholar] [CrossRef]
Luo, S.; Chu, V.W.; Zhou, J.; Chen, F.; Wong, R.K.; Huang, W. A multivariate clustering approach for infrastructure failure predictions. In Proceedings of the 2017 IEEE International Congress on Big Data (BigData Congress), Honolulu, HI, USA, 25–30 June 2017; pp. 274–281. [Google Scholar] [CrossRef]
Vališ, D.; Hasilová, K.; Forbelská, M.; Vintr, Z. Reliability modelling and analysis of water distribution network based on backpropagation recursive processes with real field data. Measurement 2020, 149, 107026. [Google Scholar] [CrossRef]
Robles-Velasco, A.; Muñuzuri, J.; Onieva, L.; Cortés, P. An evolutionary fuzzy system to support the replacement policy in water supply networks: The ranking of pipes according to their failure risk. Appl. Soft Comput. 2021, 111, 107731. [Google Scholar] [CrossRef]
Kabir, G.; Tesfamariam, S.; Loeppky, J.; Sadiq, R. Predicting water main failures: A Bayesian model updating approach. Knowl.-Based Syst. 2016, 110, 144–156. [Google Scholar] [CrossRef]
Giraldo-González, M.M.; Rodríguez, J.P. Comparison of Statistical and Machine Learning Models for Pipe Failure Modeling in Water Distribution Networks. Water 2020, 12, 1153. [Google Scholar] [CrossRef]
Pham, T.M.L.; Pham, H.H.; Do, N.A.T.; Le, D.H. Proposed probabilistic models of pipe failure in water distribution system. MATEC Web Conf. 2018, 193, 02002. [Google Scholar] [CrossRef][Green Version]
Sistema Nacional de Informações sobre Saneamento SNIS. Diagnóstico Temático Serviços de Água e Esgoto. Gestão Técnica de Água. Technical Report, Secretaria Nacional de Saneamento Ambiental—Ministério das Cidades, 2023. Available online: https://www.gov.br/cidades/pt-br/acesso-a-informacao/acoes-e-programas/saneamento/snis/produtos-do-snis/diagnosticos/DIAGNOSTICO_TEMATICO_VISAO_GERAL_AE_SNIS_2023.pdf (accessed on 21 December 2023).
Zhang, C.; Ye, Z. Water pipe failure prediction using AutoML. Facilities 2021, 39, 36–49. [Google Scholar] [CrossRef]
Gorenstein, A.; Kalech, M.; Hanusch, D.F.; Hassid, S. Pipe fault prediction for water transmission mains. Water 2020, 12, 2861. [Google Scholar] [CrossRef]
Velasco, A.R.; Muñuzuri, J.; Onieva, L.; Palero, M.R. Trends and applications of machine learning in water supply networks management. J. Ind. Eng. Manag. 2021, 14, 45–54. [Google Scholar] [CrossRef]
De Clercq, D.; Smith, K.; Chou, B.; Gonzalez, A.; Kothapalle, R.; Li, C.; Dong, X.; Liu, S.; Wen, Z. Identification of urban drinking water supply patterns across 627 cities in China based on supervised and unsupervised statistical learning. J. Environ. Manag. 2018, 223, 658–667. [Google Scholar] [CrossRef]
Barton, N.A.; Farewell, T.S.; Hallett, S.H.; Acland, T.F. Improving pipe failure predictions: Factors affecting pipe failure in drinking water networks. Water Res. 2019, 164, 114926. [Google Scholar] [CrossRef]
Fan, X.; Wang, X.; Zhang, X.; Xiong (Bill), Y. Machine learning based water pipe failure prediction: The effects of engineering, geology, climate and socio-economic factors. Reliab. Eng. Syst. Saf. 2022, 219, 108185. [Google Scholar] [CrossRef]
Rifaai, T.M.; Abokifa, A.A.; Sela, L. Integrated approach for pipe failure prediction and condition scoring in water infrastructure systems. Reliab. Eng. Syst. Saf. 2022, 220, 108271. [Google Scholar] [CrossRef]
Ramírez, R.; Cobacho, R.; Torres, D.; López-Jiménez, P. Implementación de un modelo de predicción de fallos orientado a la gestión y estrategias de mantenimiento en redes de distribución de agua potable. Ingeniería del Agua 2019, 23, 247–258. [Google Scholar] [CrossRef]
Wols, B.; Thienen, P.V. Impact of climate on pipe failure: Predictions of failures for drinking water distribution systems. Eur. J. Transp. Infrastruct. Res. 2016, 16. [Google Scholar] [CrossRef]
Zhang, C.; Wu, H.; Bie, R.; Mehmood, R.; Kos, A. Dynamic Modeling of failure events in preventative pipe maintenance. IEEE Access 2018, 6, 12539–12550. [Google Scholar] [CrossRef]
Robles-Velasco, A.; Cortés, P.; Muñuzuri, J.; Onieva, L. Prediction of pipe failures in water supply networks using logistic regression and support vector classification. Reliab. Eng. Syst. Saf. 2020, 196, 106754. [Google Scholar] [CrossRef]
Kutyłowska, M. Neural network approach for failure rate prediction. Eng. Fail. Anal. 2015, 47, 41–48. [Google Scholar] [CrossRef]
Kabir, G.; Demissie, G.; Sadiq, R.; Tesfamariam, S. Integrating failure prediction models for water mains: Bayesian belief network based data fusion. Knowl.-Based Syst. 2015, 85, 159–169. [Google Scholar] [CrossRef]
Tang, K.; Parsons, D.J.; Jude, S. Comparison of automatic and guided learning for Bayesian networks to analyse pipe failures in the water distribution system. Reliab. Eng. Syst. Saf. 2019, 186, 24–36. [Google Scholar] [CrossRef]
Studziński, A.; Pietrucha-Urbanik, K. Failure Risk Analysis of Water Distributions Systems Using Hydraulic Models on Real Field Data. Econ. Environ. 2019, 68, 14. Available online: https://ekonomiaisrodowisko.pl/journal/article/view/106 (accessed on 18 February 2024).
Pietrucha-Urbanik, K.; Studziński, A. Qualitative analysis of the failure risk of water pipes in terms of water supply safety. Eng. Fail. Anal. 2019, 95, 371–378. [Google Scholar] [CrossRef]
Marquesone, R. Big Data. Técnicas e Tecnologias para Extração de Valor dos Dados; Casa do Código: São Paulo, Brazil, 2016. [Google Scholar]
Faceli, K. Inteligência Artificial: Uma Abordagem de Aprendizado de Máquina, 2nd ed.; LTC: Rio de Janeiro, Brazil, 2021; ISBN 978-85-216-3749-3. [Google Scholar]
Google. Visão Geral da API Elevation. 2024. Available online: https://mapsplatform.google.com/intl/pt-BR/products/#elevation (accessed on 18 February 2024).
Pandas. Pandas API Reference. 2024. Available online: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html (accessed on 20 February 2024).
Scikitlearn, D. Scikit-Learn: Machine Learning in Python. 2024. Available online: https://scikit-learn.org/stable/index.html (accessed on 20 April 2024).
Tensorflow. Keras|TensorFlow Core. 2020. Available online: https://www.tensorflow.org/guide/keras?hl=pt-br (accessed on 18 December 2023).
Scikitlearn, D. 3.3. Metrics and Scoring: Quantifying the Quality of Predictions—Scikit-Learn.org. 2023. Available online: https://scikit-learn.org/1.3/modules/model_evaluation.html#regression-metrics (accessed on 31 October 2023).

Figure 1. Recurring failures.

Figure 2. Failure predictions—region of Guarabira–Brazil.

Figure 3. Stages of work development.

Figure 4. Training convergence error.

Figure 5. Error histogram (in days).

Table 1. Description of the attributes/variables used by the predictive models.

Attribute/ Variable	Name	Attribute Use	Data Type	Min Value	Max Value	Metric Unit
1.A	days_until_ next_failure	Target	integer	1	1081	days
2.A	dist_ nearest_neighbor	Predictor	float	∼0.01	0.126	meters
2.B	elevation	Predictor	float	27.63	132.5	meters
2.C	qtd_recurrences	Predictor	integer	2	95	failures
2.D	days_since_ last_failure	Predictor	integer	4	1174	days
2.E	qtd_recurs_street	Predictor	integer	1	131	failures
2.F	avg_days_ btw_failures	Predictor	float	1.5	785.5	days
2.G	variance	Predictor	float	4.25	5.2 × 10⁵	days
2.H	standard_deviation	Predictor	float	2.06	722	days
2.I	qtd_days_ btw_failures	Predictor	integer	5.0	1668	days

Table 2. Correlation between attributes or variables.

90ATTRIBUTES	2.A	1.0	−0.3	−0.1	0.1	0.2	0.0	−0.1	−0.1	−0.2	−0.3
	2.B	−0.3	1.0	−0.2	0.1	−0.6	0.1	0.1	0.1	0.0	−0.1
	2.C	−0.1	−0.2	1.0	−0.3	0.4	−0.5	−0.2	−0.3	0.4	0.4
	2.D	0.1	0.1	−0.3	1.0	−0.2	−0.1	−0.1	−0.1	−0.5	−0.2
	2.E	0.2	−0.6	0.4	−0.2	1.0	−0.4	−0.2	−0.4	−0.1	0.0
	2.F	0.0	0.1	−0.5	−0.1	−0.4	1.0	0.7	0.8	0.3	0.2
	2.G	−0.1	0.1	−0.2	−0.1	−0.2	0.7	1.0	0.9	0.3	0.1
	2.H	−0.1	0.1	−0.3	−0.1	−0.4	0.8	0.9	1.0	0.4	0.2
	2.I	−0.2	0.0	0.4	−0.5	−0.1	0.3	0.3	0.4	1.0	0.4
	1.A	−0.3	−0.1	0.4	−0.2	0.0	0.2	0.1	0.2	0.4	1.0
		2.A	2.B	2.C	2.D	2.E	2.F	2.G	2.H	2.I	1.A
		ATTRIBUTES

Table 3. Comparison of models in terms of accuracy.

Error (Days)	MLP Manual	MLP Auto.	Linear Regr.	KERAS DNN	KERAS LR
MAE	33.84935	37.06102	45.34312	236.10285	64.07651
MSE	1981.12961	1994.35068	3762.60283	67,591.47127	5740.66704
RMSE	44.50988	44.65815	61.34006	259.9836	75.76719
MAPE	3.63132	4.10978	7.99508	35.77144	1.93596
MedAE	28.37494	30.69204	34.8845	248.3089	74.3719
Max. Erro	158.47765	150.88351	251.06084	492.20184	217.66253

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Medeiros, V.d.S.; dos Santos, M.D.; Brito, A.V. Case Study for Predicting Failures in Water Supply Networks Using Neural Networks. Water 2024, 16, 1455. https://doi.org/10.3390/w16101455

AMA Style

Medeiros VdS, dos Santos MD, Brito AV. Case Study for Predicting Failures in Water Supply Networks Using Neural Networks. Water. 2024; 16(10):1455. https://doi.org/10.3390/w16101455

Chicago/Turabian Style

Medeiros, Viviano de Sousa, Moisés Dantas dos Santos, and Alisson Vasconcelos Brito. 2024. "Case Study for Predicting Failures in Water Supply Networks Using Neural Networks" Water 16, no. 10: 1455. https://doi.org/10.3390/w16101455

APA Style

Medeiros, V. d. S., dos Santos, M. D., & Brito, A. V. (2024). Case Study for Predicting Failures in Water Supply Networks Using Neural Networks. Water, 16(10), 1455. https://doi.org/10.3390/w16101455

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Case Study for Predicting Failures in Water Supply Networks Using Neural Networks

Abstract

1. Introduction

2. Related Works

3. Problem Definition

Failure History

4. Materials and Methods

5. Results and Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI