1. Introduction
Globally, in recent years, water consumption has increased by up to 600%, reflecting an increasingly intensive use of this vital resource. Consequently, nearly 10% of people live in countries with high or critical levels of water stress [
1]. Moreover, projections indicate that by 2050, more than 40% of the population could face water scarcity issues [
2]. In summary, society is facing a severe water crisis driven by multiple factors.
A significant factor contributing to the water crisis is water losses in distribution systems; the annual volume of these losses could supply approximately 200 million people and represents an estimated cost of USD 141 billion [
3]. It is also known that roughly one-third of these losses occur in developing countries, where nearly 45 million m
3 of water are wasted daily [
3]. The main causes of leaks include pipeline aging, pressure fluctuations, external vibrations, and installation errors [
4]; these factors may arise in both urban and industrial environments, affecting storage tanks, pressurized networks, and wall-mounted distribution systems [
5]. Consequently, various methods and approaches have been proposed for leak detection.
Detection methods have evolved from traditional approaches based on visual inspection to hybrid methods that combine data acquisition with artificial intelligence (AI) tools. According to the recent literature, research has addressed the integration of dynamic models with machine learning (ML) [
6], as well as the combined use of machine learning with Fourier [
7] and wavelet transforms [
8].
Additionally, some studies reviewing the state of the art have focused on technologies based on micro-electro-mechanical systems (MEMSs), under the premise that they are less prone to false alarms and more effective in PVC pipelines [
9]. Pressure wave propagation and transient-based methods have demonstrated strong potential for leak detection in water pipelines, as evidenced by extensive experimental studies reported in the literature [
10]. In contrast, Bui et al. [
11], reported supervised artificial intelligence algorithms used up to 2023 for detecting and locating leaks, considering climatological, geological, and social factors that influence such failures in distribution networks.
In addition, a specialized review published in 2022 [
12] provides an integrated analysis of the methodologies developed within the BattLeDIM competition, a benchmark event organized in 2020 to compare leak detection and isolation strategies in a controlled, SCADA-based virtual distribution network. The review synthesized the performance of approaches ranging from time series analysis and statistical modeling to machine learning, mathematical programming, and metaheuristic techniques, highlighting their strengths, limitations, and economic implications. This work underscores the relevance of benchmark datasets and comparative studies for advancing leak detection research.
Finally, Wu et al., [
13] focused their review on unsupervised learning models used for water leak detection in distribution networks and on the main limitations of dataset generation. However, despite the reported progress, the literature lacks a comprehensive review that consolidates methodologies aimed at detecting leaks in water pipelines using machine and deep learning techniques. In particular, there is no analysis that systematically organizes the types of sensor data used along with their preprocessing, while also classifying the algorithms applied, the datasets used, and the types of output.
Therefore, this review article aims to critically analyze leak detection methodologies in potable water systems, with special emphasis on AI-based methods. To this end, it covers machine learning, deep learning, and hybrid techniques based on publications from the 2018–2025 period. The purpose of this review is to identify the main input variables, the most commonly used machine learning algorithms, the characteristics of the datasets, and the output types used in recent studies on leak detection in water distribution networks. Accordingly, this review seeks to address the following research questions:
What types of sensor and input data are most frequently used in leak detection systems?
Which machine learning algorithms have been most widely applied in recent studies for detecting and locating leaks in water distribution networks?
Which hybrid approaches and combined methodologies have shown the best performance in real and simulated scenarios?
How do dataset characteristics and output types influence the performance, applicability, and generalization of AI models for leak detection?
Although leak detection in water distribution networks has been extensively investigated for more than a decade, the rapid growth of machine learning applications has led to a methodologically diverse body of research. Consequently, rather than attempting to provide an exhaustive inventory of all existing studies, this review adopts a structured and critical scope, focusing on peer-reviewed research published between 2018 and 2025 that explicitly applies data-driven artificial intelligence techniques to water distribution networks.
This period reflects the consolidation phase of ML- and DL-based approaches in this domain, during which significant methodological transitions have taken place, including the shift from classical machine learning toward deep and hybrid architectures. Within this scope, the objective of the review is not to aggregate all publications related to the topic but to systematically examine representative and methodologically comparable studies, enabling a meaningful synthesis of input variables, datasets, algorithms, and output strategies.
Therefore, this review emphasizes analytical depth and cross-sectional comparison over numerical completeness, providing a critical framework to identify dominant trends, methodological gaps, and emerging research directions in AI-based leak detection for water distribution networks. While existing reviews mainly concentrate on algorithmic developments and performance metrics, this review adopts a complementary perspective by emphasizing dataset properties, sensor deployment, and reporting practices, which critically affect model transferability and real-world implementation.
2. Materials and Methods
This section describes the search procedure used to identify publications related to the application of artificial intelligence methods for the detection and classification of water leaks in distribution networks. The process was conducted in three main stages:
Identification of keywords describing the techniques used in the studies;
Delimitation of the application area of these techniques;
Definition of the specific purpose of their use in leak detection.
The methodology of this systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines. The search was carried out in the Scopus database, considering its broad coverage of engineering and computer science research. To ensure the relevance and recency of the results, the period of 2018–2025 (up to August 2025) was selected. This range was chosen for two main reasons: (1) to include recent research and (2) to consider the significant growth in the use of artificial intelligence in recent years.
The following keywords were used in the search in order to incorporate the three stages indicated above: (machine AND learning) AND (water OR pipeline) AND leak AND detection AND (classification OR localization).
The selected time range is justified by two main considerations: ensuring the relevance and timeliness of the reviewed studies and accounting for the rapid growth observed in recent years in the application of artificial intelligence techniques to water distribution systems.
The search query used was as follows:
TITLE-ABS-KEY ((machine AND learning) AND (water OR pipeline)
AND leak AND detection
AND (classification OR localization))
AND PUBYEAR > 2017 AND PUBYEAR < 2026
AND (LIMIT-TO (PUBSTAGE, “final”))
AND (LIMIT-TO (DOCTYPE, “ar”))
AND (LIMIT-TO (SUBJAREA, “ENGI”))
AND (LIMIT-TO (LANGUAGE, “English”))
Two reviewers independently screened titles and abstracts, followed by a full-text assessment. Disagreements were resolved through discussion and consensus. No automation tools were used during the selection process.
Data extraction was performed independently by two reviewers using a standardized extraction form, and a third reviewer verified the extracted data.
The outcomes extracted included the type of leak detected, machine learning model used, evaluation metrics (accuracy, F1 score, precision, and recall), dataset characteristics, and sensor type. Additional variables extracted included the publication year, country, pipeline characteristics, type of study (simulation, experimental, or field), sensor specifications, computational tools, and implementation details.
2.1. Inclusion Criteria
In this section, the inclusion criteria established by the authors to determine the eligibility of the articles considered in the review are presented:
Research articles or review papers;
Documents published in English and peer-reviewed;
Studies applying artificial intelligence techniques for leak detection or classification;
Publications in the engineering field;
Works published within the period of 2018–2025.
2.2. Exclusion Criteria
Likewise, the exclusion criteria defined to discard those studies that did not meet the previously established methodological, thematic, or quality requirements are detailed below:
Documents in languages other than English;
Preliminary or non-finalized publications;
Studies outside the scope of AI applied to leak detection in distribution networks.
2.3. Synthesis and Assessment Methods
A formal risk-of-bias tool was not applied due to the heterogeneity of the included studies in terms of experimental design, dataset characteristics, sensor configurations, and reporting structures. Instead, potential sources of bias were examined narratively by assessing the clarity of the methodological description, the transparency of the dataset reporting, and the validation procedures used in each study. Two reviewers (M.Z.-U. and J.M.A.-A.) independently assessed these aspects and resolved discrepancies through discussion. No automated tools were used.
Effect size measures were not calculated, as a meta-analysis was not performed. Instead, the review synthesized the performance metrics reported by each study, including the accuracy, precision, recall, and F1 score.
The studies were narratively grouped according to four main dimensions: (1) the type of leak investigated, (2) the machine learning model used, (3) the sensor technology, and (4) the dataset characteristics. No data transformations or imputations were performed; all values were extracted as reported in the primary studies. Missing information was recorded as “not reported”.
The results were tabulated using structured summary tables that describe the study characteristics, model types, datasets, sensors, and performance metrics.
Given the methodological diversity of the included studies, a qualitative synthesis approach was used. No subgroup analyses, meta-regression, or statistical modeling were performed to explore heterogeneity. In addition, no sensitivity analyses were performed.
A formal assessment of reporting bias was not conducted due to the absence of a quantitative synthesis.
This systematic review was preregistered in the Open Science Framework (OSF) (registration link:
https://osf.io/wfdtp, accessed on 21 February 2026). The registration is currently under embargo, and the DOI will be assigned automatically upon embargo release. No formal protocol document was prepared; methodological decisions are fully reported in the present manuscript.
Finally, no formal certainty assessment framework was applied because this review is qualitative and synthesizes engineering-focused studies.
Figure 1 shows the PRISMA flow diagram summarizing the process of searching, selecting, and excluding articles. This review was conducted in accordance with the PRISMA 2020 guidelines, as reflected in
Figure 1.
3. Results
This section reviews studies published in the last seven years that addressed leak detection in water distribution networks through machine learning (ML) and deep learning (DL) models.
3.1. Study Selection and Evidence Summary
The Scopus search identified 175 records. After removing 5 non-English publications and restricting the time window to 2018–2025, 161 studies remained for full-text assessment. Applying document type and subject area filters reduced the set to 64 records. Thirteen studies were excluded due to lack of full-text access (
n = 4), insufficient methodological detail in extended conference papers (
n = 4), or incompatibility with water distribution contexts (gas pipeline studies;
n = 5). A total of 53 studies met all inclusion criteria. The complete selection process is illustrated in the PRISMA flow diagram (
Figure 1).
Across the included studies, leak scenarios ranged from small orifice failures to longitudinal cracks and joint-related defects. The reported sensing modalities included pressure, flow, vibration, and acoustic measurements obtained through laboratory experiments, numerical simulations, hybrid configurations, and, in fewer cases, field implementations. Analytical approaches spanned traditional ML classifiers (SVMs, RFs, k-NN, and gradient boosting), deep learning architectures (CNNs, LSTMs, autoencoders), and hybrid schemes that combined data-driven and feature-based methods.
A formal risk-of-bias tool was not applied due to the heterogeneity of experimental designs. Instead, a qualitative appraisal was conducted. Common limitations included incomplete reporting of the sensor placement, dataset structure, leak simulation procedures, or validation strategies. Two reviewers evaluated methodological quality independently, resolving discrepancies through discussion.
Performance was reported primarily using the accuracy, precision, recall, and F1 score. Due to substantial variability in sampling frequencies, sensor set-ups, and leak conditions, the results were synthesized narratively rather than compared quantitatively. Overall, vibration and acoustic signals were more effective for detecting small leaks, whereas pressure signals performed better for medium and large failures. SVMs frequently yielded stable performance for classification, while the DL and hybrid models offered advantages in multiclass detection and localization.
No statistical synthesis, meta-analysis, or sensitivity analysis was feasible. Potential reporting biases were noted, as several studies provided only the best-performing results without uncertainty estimates. Certainty of evidence was assessed narratively, reflecting consistency in the observed trends but limited generalizability due to the predominance of simulated and laboratory datasets.
3.2. Study Characteristics
The 53 included studies exhibited substantial methodological diversity in leak types, sensing technologies, data acquisition conditions, and analytical models. The leak scenarios ranged from simple orifice leaks to complex structural failures. The sensing modalities included accelerometers, hydrophones, pressure transducers, and flow meters, capturing signals that exploit different physical mechanisms. Datasets varied in duration, sampling frequency, and acquisition environment, covering simulated, laboratory, hybrid, and field configurations.
Figure 2 shows the evolution in the number of publications identified from 2018 through August 2025. It can be observed that between 2018 and 2020, studies were scarce, reflecting a still-emerging field; starting in 2021, steady growth was observed, reaching a peak in 2024.
Regarding analytical techniques, both ML and DL models were widely applied. Traditional ML algorithms (SVMs, RFs, and k-NN) remained competitive, especially for binary detection and medium-sized datasets. DL architectures (CNNs, LSTM, and autoencoders) dominated tasks requiring extraction of complex spatiotemporal patterns. Hybrid methods that combine neural feature extraction with classical classifiers have emerged as promising alternatives. Performance metrics typically included the accuracy, precision, recall, and F1 score, though reporting completeness varied across the studies.
Figure 3 presents the bibliometric network from the authors’ keywords, which guided the classification of subsequent results and highlights the thematic structure of current research in AI-based leak detection. Different colors represent thematic clusters identified through keyword co-occurrence analysis.
3.3. Input Variables
In the analyzed studies, researchers have considered different input variables to feed artificial intelligence models, with the most common ones being pressure, flow rate, and vibrations, although variables related to temperature and flow type have also been used. The preprocessing of these data, as well as the location of the sensors used for data acquisition, are summarized in
Table 1. The main objective of integrating these variables is to improve accuracy and reduce leak detection times in water distribution networks. It is worth noting that in most articles, the information came from volumetric flow transducer sensors installed in test benches or hydraulic laboratories, reflecting the experimental tendency of this research field.
3.3.1. Pressure
According to [
14], pressure is identified as the most suitable variable for the study of anomalies in pipelines, since changes in this measurement often reflect malfunctioning or improper use of the water distribution network. In this regard, pressure was the most commonly used variable as the sole input in nine of the reviewed articles.
In [
15], a methodology is proposed for leak detection and localization, as well as for estimating their magnitude within a distribution system. The approach is based on analyzing pressure variations caused by fixed and variable area discharges, supported by a prediction algorithm linked to a hydraulic interface with geographic information from the system. Although the spatial localization of leaks was not always accurate, the authors emphasized that this method helps reduce the search area and identify critical points of potential leakage.
In [
5], an experimental study is presented, using a test bench where pressure is the main variable. The test bench replicates typical configurations of distribution networks, including “U”-shaped sections and angular bends. The results highlight that this is an underexplored area with significant potential for future research. Additionally, it is concluded that the most relevant locations for installing sensors are those far from clamps and close to angular or U-shaped bends, where sensitivity to leaks is greater.
In [
16], a method for detecting and locating leaks is proposed by placing pressure sensors downstream of the leak and upstream of a coupling. Based on further studies, the authors determined that the sampling frequency of their pressure sensors must be 50 Hz. At this frequency, it is possible to capture negative pressure wave signals, i.e., depressurizing pressure waves propagating inside the pipeline.
In [
17], a three-stage methodology was developed: model calibration, identification, and localization. The results show that long-term pressure data allow effective extraction of Seasonal and Trend Decomposition Using Loess (STL). Based on this, a clustering algorithm can be applied, facilitating the distinction of different leak scenarios in the network.
Complementarily, ref. [
18] highlights the usefulness of transient pressure data for leak detection. The authors first designed a transient simulation model and then selected the most relevant features through sensitivity analysis. Finally, the approach was validated in an experimental tank–pipe–valve system. Using this method, higher detection accuracy was achieved, although high levels of model-related uncertainty were acknowledged.
The work in [
19] proposes an unsupervised learning approach in which normal water pressure data are compressed and decompressed to detect anomalies. While the method performs adequately using normal operational records, results are limited when input data correspond to leakage conditions, revealing the need to improve algorithm robustness.
In [
20], the correlation between the pressure measurements, leak initiation time, and leak area was investigated. To this end, a massive dataset was generated using the Water Network Tool for Resilience (WNTR) simulator and complemented by pressure sensors in the network. This study enabled evaluation of sensor placement quality within the distribution network and reinforced the importance of experimental design in instrument positioning.
In turn, ref. [
21] proposes combining fixed and mobile pressure sensors, which proved to be more effective in operational scenarios by improving leak localization prediction. However, the authors identified frequent connectivity losses in mobile sensors and the model’s inability to handle simultaneous leaks as key limitations.
3.3.2. Flow Rate and Flow Type
In [
22], the use of pressure sensors and flow meters in transmission lines was compared for leak detection. The authors concluded that both types of sensors are useful for monitoring inflows and outflows; however, the main challenge arises when leaks correspond to small cracks. In such cases, the flow meter showed more reliable performance for data collection than pressure sensors.
Complementarily, ref. [
23] indicates that flow sensors allow accurate leak detection without the need for costly upgrades to instrumentation. The results confirm that by optimizing the measurement area, it is possible to increase precision in anomaly identification.
3.3.3. Other Variables
According to [
24], the use of semi-permanent commercial vibro-acoustic sensors for fault detection was analyzed. The results show that it is possible to obtain reliable data to evaluate different pipe conditions, materials, and surrounding soil characteristics. The study compared four different sensor models, concluding that detection accuracy is not significantly affected by the type of device used.
Similarly, ref. [
25] highlights that vibro-acoustic sensors are particularly effective for detecting leaks in metallic pipelines with diameters smaller than 375 mm. The authors noted that in plastic pipes, the signal attenuated, making data acquisition more difficult.
In [
26], an approach was proposed to extend the range of sensors by working with low frequencies in vibration spectra. This reduces system interference, improving leak prediction and helping minimize operational costs.
Finally, ref. [
27] proposes a methodology based on boiler process monitoring using thermal sensors. The approach consists of independently processing the data from each sensor, which allows for a more targeted identification of potential leaks.
Table 1 summarizes the reviewed articles and includes additional information, such as the type of flow analyzed and whether the data were preprocessed before being added into the machine learning models. The table also shows that pressure remains the most commonly used variable, although the flow rate, acoustic vibrations, and temperature have proven to be valuable complements in specific scenarios. The diversity of approaches reflects both the complexity of the phenomenon and the need to integrate multiple sources of information to improve the accuracy and robustness of detection systems.
Table 1.
Input variables used in the state of the art.
Table 1.
Input variables used in the state of the art.
| Work | Input Variable | Preprocessing | Sensor Location |
|---|
| [14] | Pressure | Feature extraction | Near water reservoirs |
| [15] | Pressure | Interpolation | Before and after the leak |
| [5] | Pressure | Fourier transform | At elbows, clamps, right angles, and U bends |
| [16] | Pressure | Feature extraction | After the valve and before the leak |
| [24] | Vibration | Spectral analysis, FFT | Valves and hydrants |
| [22] | Flow rate | Laguerre filter | Before and after bends |
| [25] | Vibration | Spectral analysis, FFT | Before and after bends |
| [17] | Pressure | Residual analysis | Positioned using genetic algorithms for optimization |
| [18] | Pressure | Butterworth filter | Before and after the leak |
| [19] | Pressure | Dimensionality reduction (PCA) | Positioned using genetic algorithms and PSO for optimization |
| [26] | Vibration | Normalization, FFT | Before and after bends |
| [20] | Pressure | — | Positioned using genetic algorithms and PSO for optimization |
| [21] | Pressure, flow rate, level | — | Pressure sensors in reservoirs and plants; flow sensors at network boundaries; distributed sensors |
| [27] | Temperature | Normalization, windowing | Valves and boiler connections |
| [23] | Flow rate | Kalman filter | Before and after bends |
3.4. Types of AI Models
The reviewed studies employed a wide variety of artificial intelligence algorithms to address leak detection and localization in water distribution networks. These approaches can be grouped into three main categories: machine learning (ML) models, deep learning (DL) models, and hybrid models, which combine techniques from both areas or integrate optimization methods. This classification allows for analysis of the strengths and limitations of each algorithm type.
3.4.1. Machine Learning (ML)
Support Vector Machine (SVM)
In [
28], an SVM model was implemented, chosen for its ability to efficiently classify features obtained after data preprocessing. The authors highlighted that this algorithm shows low vulnerability to noise, making it a suitable option for early detection of pipeline conditions.
As shown in [
4], an SVM was employed to classify input instances from a spherical tank, leveraging its capacity to process nonlinear data. The results showed that this algorithm is appropriate for identifying leaks in this type of structure.
Mysorewala et al. [
5] compared three machine learning models for detecting leaks in wall-mounted water distribution systems. The SVM achieved the best performance, and it was observed that increasing the number of sensors produced a more complete training dataset, improving detection and leak size classification accuracy.
For real-time leak detection, Ayati et al. [
18] proposed a hybrid framework that integrates transient hydraulics with machine learning, using an SVM as the main algorithm. The model was optimized through sensitivity analysis and evaluated in a tank–pipe–valve experimental system, contrasting classification and regression approaches. The study highlights that the classification-based model offered higher precision, stability, and reliability even under uncertainty.
In [
29], an SVM was proposed for leak detection in oil distribution networks. The algorithm was selected because it performs well in large-scale spaces, can handle large data volumes, and is suitable for nonlinear inputs.
Additionally, ref. [
30] introduced an integrated system for managing and analyzing acoustic leak data. The framework included variational mode decomposition (VMD), noise reduction using wavelets, feature extraction, and an SVM as the core classification model. The authors note that the SVM not only achieved high accuracy in classifying acoustic signals but also demonstrated scalability and potential for real-time monitoring in water distribution networks.
Random Forest (RF)
In [
15], a supervised random forest model was applied to a water distribution network for leak detection. The RF algorithm was chosen for its efficiency in handling large data. It was applied to predict the leak rate and locate it within a 100 m radius, showing superior performance compared with a single decision tree.
As seen in [
19], leak localization was addressed as a classification problem within the network. Although several ML algorithms were considered, the random forest (RF) was selected for its efficiency and minimal hyperparameter tuning requirements.
In [
20], the RF algorithm was tested to identify the leak node in a simple distribution network model, showing high effectiveness, especially in datasets with many features, where it helps reduce the overfitting risk. The algorithm was implemented in
scikit-learn and trained with multiple decision trees using bagging, emphasizing that the number of trees is the key hyperparameter to improve model accuracy.
Clustering-Based Approaches
The study in [
19] proposes a semi-supervised framework (CtL-SSL) that integrates PCA, autoencoders, and a modified k-means clustering to detect and localize leaks with highly limited labeled data. The approach leverages large unlabeled datasets and network topology but remains dependent on hydraulic simulations and constrained by the limited spatial representativeness of traditional clustering.
Gradient Boosted Trees (GBTs)
In [
31], a gradient boosted trees (GBTs) classifier was tested, which combines several simple decision trees sequentially to form a more robust model. Unlike other tree-based methods, this algorithm corrects the errors of each iteration through a gradient descent strategy, reducing loss and improving classification performance. Its flexibility in handling different loss functions and strong performance with complex data make it a solid alternative for leak detection.
K-Nearest Neighbors (KNN)
In the work of [
32], leak detection and leak size prediction in a pipeline network were addressed using several classifiers, mainly SVMs and KNN. The input data were processed using statistical, wavelet-based, and correlated features. Although classifier accuracy varied, the overall results were satisfactory in terms of precision, training time, and prediction speed.
Gaussian Extreme Learning Machine (GELM)
The research in [
6] presented an improved extreme learning machine (ELM) model with Gaussian mixture, called a GELM, for leak detection in distribution networks. Unlike the traditional ELM, this approach assigns input weights based on the statistical characteristics of the data, helping to avoid misclassification. In the tests conducted, the GELM achieved remarkably high accuracy, even reaching 100% in some cases, although the authors warned that such performance may vary in larger urban networks.
3.4.2. Deep Learning (DL)
Convolutional Neural Network (CNN)
Based on [
16], a leak detection and localization method was proposed that integrates transfer learning and particle swarm optimization (PSO) for weight optimization. This strategy allows automatic feature extraction and reduces the need for large datasets. The model was trained with laboratory data and evaluated using simulated information, showing promising performance.
According to [
33], a different approach was presented: a blind convolution algorithm called Complex-Efficient Fast Independent Component Analysis (C-EFastICA), designed to improve leak localization in branched pipelines where noise complicates analysis. This method proved faster and more accurate than traditional FastICA, achieving about 88% accuracy in leak localization even under branch noise interference. The authors noted, however, that when the number of branches exceeds that of sensors, the model’s effectiveness decreases, suggesting future research directions.
In [
25], two CNN-based approaches were evaluated for field leak detection: a Fully Connected Neural Network (FCNN), which uses the fast Fourier transform (FFT) of filtered acoustic signals, and a Time-Frequency Convolutional Neural Network (TFCNN), which processes spectrograms at different resolutions to better capture time–frequency variations. The results were highly promising, suggesting that CNN models can be integrated into water company leak monitoring programs.
A CNN model used to detect and classify water leaks in a pipe was presented in [
34]. The authors noted that CNNs can detect leaks in real time using pretrained parameters. They also mentioned that to improve computational efficiency, the magnitude spectrum was transformed into a two-dimensional matrix before CNN input.
In [
35], a CNN model was compared to a multilayer perceptron (MLP) for leak localization and size prediction in different simulated scenarios. While the MLP treated inputs as independent vectors, the CNN learned the spatial relationships among the pressure sensors, recognizing leak patterns more accurately. This advantage was especially clear for random leaks, where CNN performance was notably superior.
Finally, the work in [
36] mentioned that CNNs offer several advantages, such as their ability to automatically extract and learn features from datasets, enabling detection of subtle patterns in input data. Moreover, CNNs can handle large datasets, making them suitable for scalable real-time monitoring and visualization applications.
Artificial Neural Network (ANN)
In [
37], the authors noted that since the problem is a binary classification, an ANN is the most appropriate choice because such models are trained with labeled data. This is advantageous as the output can take only two categories: leak or no leak. The final decision is obtained through a voting strategy; if in a set of
n tests, more than 50% of results indicate a leak, then it is concluded that a leak is likely occurring.
In [
38], the authors emphasized that when working with an ANN algorithm, it is essential to consistently adjust the number of hidden layers, neurons, and activation functions. This calibration avoids overfitting while improving model accuracy.
Deep Neural Network (DNN)
In [
8], the DNN algorithm was selected due to its high feature extraction capability, within which an analysis was conducted to identify the most discriminative features.
3.4.3. Hybrid Models
In [
22], an SVM model was applied for leak identification, followed by a multivariable fuzzy backstepping method to determine the leak size under abnormal conditions and its position based on parameter separation.
In [
39], the authors used a Combined Dual Prediction-based Data Fusion (CDPDF) model to reduce data transmission and extend the lifetime of wireless sensor networks (WSNs), as their methodology relies on wireless sensors. The term “combined” refers to the use of both local and neighboring node measurements for prediction, while “dual prediction” means that prediction occurs synchronously at both the primary and secondary nodes.
Hyperclustering
As introduced in [
40], hyperclustering integrates CNN and LSTM deep features with shallow environmental attributes through hypergraph learning to capture high-order relationships in multimodal tunneling leakage data. Although effective for complex environments, it requires discretization of continuous data and extensive sensing infrastructure, limiting its applicability to real-world water networks.
The hybrid MLDLF scheme presented in [
17] combines STL decomposition, k-means clustering, and hydraulic model-based diagnosis to detect multiple leaks. STL isolates seasonal patterns, while clustering reduces the search space; however, the method relies on a calibrated hydraulic model and may confuse normal variability with small leaks, especially in large networks.
Convolutional Long Short-Term Memory (ConvLSTM)
In [
41], the authors employed ConvLSTM, which combines a 3D CNN with a variation of the recurrent neural network (RNN), LSTM, to manage spatiotemporal data. It differs by using convolutional operations instead of fully connected layers. This hybrid approach enables accurate leak detection in pipelines by learning and reconstructing their normal state, any deviation from which is labeled as a potential leak.
Handcrafted Features CNN (HF-CNN)
Wu et al. [
42] proposed that the HF-CNN algorithm is highly useful in cases with limited data availability. Despite the lack of extensive preprocessing, it achieved a high leak detection rate with low computational cost. However, they noted that complete understanding of multidisciplinary intersections remains a challenge.
Least Squares Support Vector Machine (LSSVM)
As presented in [
43], a hybrid approach combining deep and classical learning techniques was proposed for leak detection. The method converts acoustic signals into images using a continuous wavelet transform (CWT) and enhances them with filters to generate clearer scalograms. On these representations, a deep belief network (DBN) optimized by a genetic algorithm extracts the most relevant features, and finally, an LSSVM classifies leak and non-leak conditions. This integrated scheme achieved high accuracy and proved reliable for real-time pipeline monitoring.
In [
23], a domain-based training framework was designed where a variational autoencoder (VAE) projects time series into a regulated latent space, ensuring separation between the leak and non-leak classes. On this reduced space, a binary SVM classifier defined an optimal hyperplane for distinguishing the two groups. The VAE–SVM combination improved the detection capability and ensured greater model robustness against signal variability.
As shown in
Figure 4, the SVM algorithm is the most frequently used algorithm in the reviewed studies, both in machine learning (ML) and recent hybrid approaches. In deep learning (DL), the CNN stands out as the most widely adopted model, showing a trend similar to the SVM as it also appears in emerging hybrid architectures. On the other hand, the RF and ANN models showed moderate usage, while other algorithms such as the KNN, GELM, GBT, DNN, and various hybrid combinations (e.g., SVM + Backstepping or HF + CNN) appeared less frequently.
Overall, this distribution reflects a preference for classical ML models, particularly SVMs, although there was a gradual increase in the adoption of deep and hybrid architectures as research advanced toward more robust and adaptive models.
Table 2 shows the models used for water leak detection, along with their algorithms, the main features reported by the authors, and the most commonly used evaluation metrics according to the model type.
3.5. Datasets
The performance of machine learning models depends directly on the quality and representativeness of the data used for training and validation. Therefore, this section describes the databases reported in the literature, detailing the type of test bench employed (simulation, experimentation, or implementation), the software used in their construction, and the main technical characteristics defining each implementation. This analysis allows understanding how data generation conditions influence the obtained results.
3.5.1. Public Databases
Several databases derived from experimental implementations were identified. In [
34], a real network instrumented with approximately 11,000 pressure and flow sensors distributed across neighborhoods in Gwangju (South Korea) is reported. The dataset contained 78,204 samples classified into three categories: normal conditions, anomalous sounds, and environmental noise. The recorded signals covered a spectral range from 0 to 5120 Hz, enabling the capture of both low-frequency components associated with pressure and high-frequency components related to leak-induced vibrations.
Similarly, in [
44], EPANET was used together with the Python-based WNTR tool to simulate a network modeled after C-TOWN [
12]. Leaks were introduced to evaluate system resilience and visualize hydraulic effects, recording pressure and flow data at 45 demand nodes. The dataset was generated using demand multipliers with hourly variations, allowing the representation of different operational scenarios.
In [
45], a simulation was developed using data obtained from accelerometers, pressure sensors, and hydrophones. LABVIEW NXG 5.1 software was used for data acquisition from the accelerometer and dynamic pressure sensor, while Audacity 3.0.5 was used for hydrophone data. The database consisted of 280 measurements classified into healthy and faulty states, with each lasting 30 s. The sampling rate was 51.2 kS/s/ch for the pressure sensors and accelerometers and 8 kHz for the hydrophone.
In [
46], the information from the same C-TOWN model was expanded, consisting of 432 pipes, 388 nodes, 11 pumps, control valves, and seven storage tanks, all managed through an SCADA system and PLC controllers. Unlike the previous study, this work included disruptive events such as conventional interruptions, cyberattacks, and physical attacks, generating three datasets: 388 leak events, 128 cyberattack records, and 72 physical attack events.
3.5.2. Private Databases
In several studies, hydraulic simulations were used to generate datasets. In [
47], a simulated test bench was developed using EPANET software, based on the water distribution network of the University of Lille. This network represents a small city with 150 buildings and approximately 25,000 users and includes 15 km of coarse-mesh pipes divided into five zones. The first zone, the largest and most complex one, included 62 leak scenarios, while the remaining four zones covered a total of 164 scenarios.
Similarly, in [
48], a simulated test bench was built using EPANET 2.2, based on the Hanoi water distribution network. To represent more realistic aging pipe conditions, a reduced roughness coefficient of 130 was considered. The simulated network consisted of 34 pipes with variable diameters (20–60 inches) and 31 demand nodes monitored by virtual pressure and flow sensors.
Other studies employed multipurpose simulation tools to analyze gas network behavior. In [
49], OLGA software was used to model an 80 km-long gas pipeline, defining the thermodynamic properties of the fluid and initial conditions, with data recorded every 10 s. The system included virtual flow, pressure, and temperature sensors, simulating both normal conditions and leak scenarios, and white noise was added to the signals to approximate real conditions.
Complementarily, in [
36], a database was generated using OLGA as well, configuring 1940 leak scenarios by combining 10 orifice sizes (1–10 cm) and 194 locations distributed along 188 km. Each simulation lasted one hour with sampling every 10 s, generating 360 data points per case. The resulting dataset was divided into training, validation, and testing subsets to evaluate machine learning model performance in leak size and location estimation.
Similarly, in [
50], a physical prototype of a gas pipeline was developed, featuring a three-layer spiral structure separated by 320 mm. The system, instrumented with a Micro-II Express module and Gigabit Ethernet interface, included a pipe with an internal diameter of 25 mm and an external diameter of 32.9 mm, operating at an initial pressure of 0.6 MPa. Leaks were generated by controlled valve openings that produced orifices of different sizes.
Finally, in [
51], a simulated database was created using HUGIN Expert (v8.9), which models networks through probabilistic representations. The model included 10 nodes where hydraulic variables—pressure, flow, velocity, elevation, head loss, and demand—were recorded at five-minute intervals over 31 days. Leaks were introduced at two random nodes with magnitudes between 0 and 5% of the average flow, generating a synthetic database representing both normal and anomalous system conditions.
As observed above, the reviewed databases show a wide variety of approaches for data generation and acquisition, ranging from controlled hydraulic simulations to real field implementations.
Table 3 summarizes the type of experimentation used for data acquisition, the software employed, whether the dataset was proprietary or external, and whether it is publicly or privately accessible. It is worth noting that most datasets classified as private are available upon request from the authors, who typically provide access upon justified communication.
Table 3 summarizes the datasets employed in the reviewed studies, explicitly distinguishing their origin, technological framework, network scale, sensor configuration, and sampling frequency whenever such information is reported. This comparison reveals a strong predominance of simulation-based datasets, often generated using EPANET or related tools, particularly for large-scale networks and benchmarking purposes. Laboratory prototypes are less frequent and typically focus on high-frequency sensing modalities, while real-world implementations remain comparatively limited.
One notable observation is that quantitative information regarding the network scale, sensor density, and sampling frequency was inconsistently reported across the studies. Even when advanced machine learning models are proposed, key characteristics of the underlying testbeds—such as the number of monitored nodes, the spatial distribution of sensors, or the temporal resolution of measurements—are often omitted. This lack of standardized reporting complicates cross-study comparison and limits the assessment of model scalability and practical deployment in operational water distribution networks.
3.6. Type of Output
Machine learning models applied to leak detection in water distribution networks differ not only in their structure and input data but also in the types of outputs they generate. These outputs determine the purpose and scope of the model, whether binary detection is of the presence or absence of leaks, multiclass classification according to the type or severity of the event, or continuous estimation of parameters such as the leak size or location. This section analyzes the different output strategies reported in the literature, as well as the metrics and approaches used to evaluate model performance in each case, in order to identify predominant trends and areas of opportunity in the development of intelligent monitoring systems.
3.6.1. Binary Output
In [
52], the model output was binary, since the goal was to determine the presence or absence of a leak based on acoustic signals measured by microphones. The system does not use an optimization algorithm but applies several classifiers (RF, XGBoost, KNN, SVM, and LDA) to validate the performance of the proposed method. The main reported metric is accuracy, with values of up to 99–100%, confirming the effectiveness of the approach in discriminating between leak and ambient noise signals.
As presented in [
53], the proposed model, the TFCNN, combines the short-time Fourier transform (STFT) with a parallel convolutional neural network architecture to analyze time–frequency information. For comparison, other classifiers (DT, SVM, MLP, RF, and XGBoost) were trained, and their hyperparameters were tuned using the grid search method to ensure fair performance evaluation. The model achieved an average accuracy of 98–99%, demonstrating its ability to identify leaks with high reliability even under low-SNR conditions.
In [
54], a methodological framework called maximal discernibility and minimal redundancy–improved sequential floating forward selection (MDMR–ISFFS) was proposed to optimize leak detection through the selection of relevant acoustic features in real water distribution networks. The system uses a binary output and evaluates the performance of five classifiers (DT, RF, XGBoost, SVM, and MLP) applied to selected features in both the time and frequency domains. The most relevant features identified were the mean frequency, peak frequency, temporal mean, and zero-crossing rate, achieving accuracies between 94% and 98%, with the RF and XGBoost models performing best.
3.6.2. Multiclass Output by Severity
As presented in [
55], signals from three AE sensors and 25 extracted features were combined. The model outputs include a binary classification to distinguish between normal and leak conditions and a multiclass classification associated with the leak orifice size (0.5 mm, 0.7 mm, and 1 mm) under different pressures (13 and 18 bar). Several supervised classifiers (neural network, DT, RF, and KNN) were evaluated, achieving an overall accuracy of 99% with neural network and KNN models, demonstrating the system’s effectiveness for real-time leak detection and classification in both liquid and gaseous media.
In [
56], a deep neural network model named the Multimodel Time–Frequency Convolutional Neural Network (M-TFCNN) was developed for multiclass classification of leaks in active urban water distribution networks, based on vibroacoustic signals recorded by field-distributed sensors. The output was categorized into leaks at hydrants, meters, service lines, fire valves, private properties, and main pipes. Tests were conducted using data from two acoustic sensors (HWM and Von Roll), achieving accuracies of 98% and 95%, respectively. This approach represents a substantial improvement over conventional systems by enabling identification of the specific type of leak rather than merely its existence.
3.6.3. Multiclass Output by Location
It should be noted that this study was conducted within an international benchmarking competition organized by the C-TOWN network, in which all participating methods were evaluated under the same network topology, dataset, sensor configuration, and evaluation protocol. While the reported performance was high, with an average accuracy of 95% for detection and approximately 83% for localization, these results reflect algorithmic performance under controlled benchmark conditions rather than heterogeneous operational scenarios.
In [
57], a model for fault detection and localization in water distribution networks was developed. The approach employed three classifiers (SVM, KNN, and ANN) to identify and locate leaks in the hydraulic network of the Choba campus of the University of Port Harcourt. The system produces two outputs: a binary classification indicating the presence or absence of leaks and a multiclass classification determining the event’s location within two defined hydraulic zones. Among the classifiers, the SVM model showed the best performance, achieving 79% accuracy, followed by the KNN with 70% and the ANN with 61% for detection and localization.
As shown in [
7], a machine learning-assisted leak detection and localization system integrating distributed temperature sensing (DTS) and distributed acoustic sensing (DAS) based on fiber optics was proposed. The system generates two outputs: a binary classification distinguishing between normal and leak conditions and a spatial segmentation precisely determining the leak location along the pipeline. Experimental tests detected small leaks ranging from 0.04 to 0.30 L/s even under noisy ambient conditions, with average localization errors below 0.2 m.
In [
58], a DL model called Custom One-Dimensional Time-Series DenseNet was proposed to perform simultaneous leak detection and localization in water pipelines using a single acousto-optic sensor. The model outputs include a binary classification for detection and a multiclass classification for localization according to the sensor distance. The results showed an average accuracy of 99.08% in detection and localization.
According to [
59], a comparative framework between machine learning and deep learning models was developed. The authors implemented five ML models (KNN, DT, RF, CatBoost, and XGBoost) and four deep learning models (RNN, CNN, VGG16, and RCNN), using features extracted through Mel-frequency cepstral coefficients (MFCCs). The model outputs were defined as a multiclass classification identifying leak distance. Among the results, XGBoost was the most accurate classical algorithm (98.75%), while the RCNN with MFCC achieved 99.57% training accuracy and 95.98% test accuracy.
In [
60], a model integrating a CNN with a few-shot learning (FSL) scheme was presented, allowing classification of events even with small datasets. The model provides two main outputs: binary detection (leak or no leak) and multiclass leak localization, achieving 97.1% accuracy in detection and between 95.5% and 97.4% in localization, depending on the sensor combination used.
Table 4 summarizes the types of outputs reported by the reviewed articles, along with their main characteristics and performance metrics.
To enable a comparative analysis across the different stages of AI model development,
Table 5 provides a cross-sectional summary of the representative studies, integrating the input variables, dataset characteristics, model types, and output strategies. This synthesis highlights not only common methodological patterns but also key limitations related to data availability, sensor deployment, and model generalization.
The comparison highlights that simulation-based studies frequently reported high localization accuracy, while experimental and field applications prioritized robustness and feasibility over precision. Additionally, localization-oriented outputs were strongly associated with increased sensor density, which poses scalability challenges in large water distribution networks.
4. Discussion
4.1. Interpretation, Limitations, and Implications
The analysis of the 53 included studies revealed a clear methodological evolution in AI-based leak detection within water distribution networks. Over the last seven years, research has transitioned from traditional machine learning algorithms such as SVMs and RFs toward deep and hybrid architectures, particularly CNNs, autoencoders, and combinations such as CNN–SVM or PCA and autoencoder designs. These hybrid approaches have shown improved robustness, especially in complex leak scenarios and under noisy measurement conditions.
Pressure remains the most commonly used input variable due to its suitability for rapid anomaly detection. However, studies incorporating additional variables such as flow, vibration, or temperature consistently report improved detection sensitivity and enhanced performance in multiclass or localization tasks. This trend reflects the growing adoption of multisensor fusion and the search for richer data representations capable of capturing subtle hydraulic anomalies.
Regarding data sources, public datasets such as C-TOWN and Gwangju contribute to greater reproducibility, while simulated data, generated through EPANET, OLGA, or HUGIN Expert, remain dominant due to their flexibility and low implementation cost. Although simulated datasets enable systematic experimentation, they only approximate real hydraulic behavior, and therefore their results should be interpreted cautiously when considering operational deployment.
Despite the promising advances identified, several limitations persist in the available evidence. Many studies lack detailed descriptions of sensor placement, data acquisition protocols, leak generation methods, and validation procedures, limiting reproducibility and complicating comparative analysis. The predominance of simulated or laboratory-based datasets also restricts generalizability to real-world systems. Moreover, reporting practices across the studies are heterogeneous, with many works presenting only the best-performing metrics and omitting uncertainty estimates, unsuccessful trials, or robustness analyses.
This review is similarly constrained by methodological decisions, including reliance on a single database (Scopus) and the exclusion of non-English publications, which may have led to omissions. Heterogeneity across studies precluded statistical pooling and required a narrative synthesis. Nevertheless, convergence across independent findings provides moderate confidence in several key patterns: the suitability of SVM for small and noisy datasets, the superiority of CNN-based architectures for multiclass and localization tasks, and the practical advantages of multisensor configurations.
These trends have important implications for both research and practice. Standardizing dataset descriptions, sensor configurations, and preprocessing pipelines would facilitate fair comparison across models and expedite operational adoption. Increasing the availability of real-world datasets, along with systematic field-scale validation, remains critical for advancing deployable AI-based leak detection systems. Future research directions include incremental learning, hybrid digital twins, and multisensor data fusion, areas that could significantly narrow the gap between controlled experiments and full-scale water distribution networks.
4.2. Theoretical Ideal Model
Rather than proposing a universal or prescriptive architecture, the conceptual model presented in this section synthesizes recurrent design patterns observed across the reviewed studies. These patterns emerge from the comparative analysis of input variables, dataset characteristics, model families, and output strategies and reflect common methodological choices adopted under different experimental constraints.
Accordingly, the proposed framework should be interpreted as a reference guideline highlighting dominant practices and trade-offs, rather than as an optimal solution applicable to all water distribution networks.
Based on the analysis of the reviewed studies, a conceptual reference model is proposed for the detection and localization of leaks in water distribution networks. This model is grounded in a hybrid, multivariable, and adaptive approach, capable of operating in real time and integrating seamlessly with existing hydraulic infrastructures.
Figure 5 illustrates the methodological process synthsizing the general structure of this conceptual framework.
4.2.1. Data Acquisition
The model employs a distributed sensor network combining measurements of pressure, flow, vibration, and acoustic signals. Sensor nodes should be strategically placed at pipe bends, junctions, and areas with a known history of leaks, specifically before and after critical fittings. Each sensor performs local preprocessing that includes digital filtering, normalization, and data compression, aiming to reduce both energy consumption and the volume of transmitted data.
4.2.2. Preprocessing and Data Fusion
Data from the various sensors are subjected to preprocessing techniques such as PCA, wavelet transforms, or FFT to extract relevant features and reduce dimensionality. An incremental learning scheme is proposed which is capable of incorporating new data without requiring complete model retraining.
4.2.3. Machine Learning Analysis
The model integrates a hybrid approach that combines the generalization capabilities of neural networks with the interpretability of statistical models. The system core combines deep learning models (CNNs, LSTM, and autoencoders) for automatic extraction of spatiotemporal patterns with classical machine learning models (SVMs and RFs) for final classification and validation. Furthermore, a transfer learning module is incorporated to adapt the model to new areas or network configurations without requiring large datasets.
4.2.4. Diagnosis and Feedback
The system is designed to produce three output levels:
Binary: leak or no-leak detection;
Multiclass: classification by severity or event type;
Spatial: sub-metric precision localization through cross-correlation of signals.
Information is visualized in a SCADA environment or IoT platform, allowing automatic feedback to valves, pumps, or control systems. This enables prioritization of critical zones according to the event magnitude.
Finally, the model incorporates continuous improvement, where real operational data are used back in the training system, enhancing robustness against noise, demand variability, and climatic fluctuations. Its implementation represents a pathway toward intelligent and resilient water networks, oriented toward more efficient, sustainable, and autonomous water management.
4.3. Contribution
This ideal model does not seek to replace existing developments but rather to integrate their strengths into a coherent and adaptable framework, enabling progress toward autonomous, self-adjusting detection systems compatible with water digitalization schemes. The future validation of this theoretical model could rely on hybrid databases (simulated and real) and integration with SCADA platforms and digital twins.
Table 6 presents an overview of the scope and contributions of the most relevant reviews on leak detection in water distribution networks using artificial intelligence. This comparison shows that the present work extends and complements previous research approaches, providing an updated synthesis oriented toward the practical integration of artificial intelligence into real hydraulic systems.
Despite the high performance levels reported by many machine learning-based approaches, the comparative analysis revealed a persistent gap between experimental conditions and real-world deployment in water distribution networks. Simulation-based studies dominate the literature and provide valuable benchmarking environments; however, they often assume idealized sensing configurations and omit practical constraints related to sensor density, communication reliability, and maintenance costs. Conversely, laboratory prototypes and real-world implementations typically operate under limited instrumentation and lower sampling frequencies, which directly affect model robustness and generalization capabilities.
Overall, this analysis indicates that reported accuracy metrics should be interpreted in the context of the underlying data acquisition conditions rather than as standalone indicators of practical applicability. The synthesis presented in this review highlights the need for more standardized reporting of dataset characteristics, sensing configurations, and temporal resolution in order to enable fair comparison across studies and support informed decision making for operational leak detection systems.
5. Conclusions
Input variables constitute a fundamental component in leak detection using artificial intelligence, as they determine the sensitivity and reliability of the models. Among them, pressure stands out as the most widely used one due to its direct relationship with hydraulic phenomena and ease of acquisition, followed by the flow rate and vibration signals. The reviewed studies reveal a growing trend toward the use of combined sensors and multivariate datasets aimed at improving accuracy, reducing uncertainty, and advancing real-time diagnostic systems applicable to real-world water and gas distribution networks.
The findings also demonstrate a progressive evolution in AI models applied to leak detection and localization, moving from traditional machine learning techniques to deep and hybrid architectures designed to enhance autonomy, accuracy, and generalization capacity. Classical models such as SVMs and RFs remain relevant due to their stability and low data requirements, whereas convolutional neural networks (CNNs) have shown notable potential for capturing complex spatiotemporal patterns in hydraulic and acoustic signals. Hybrid approaches are emerging as a consolidated trend, combining the power of deep learning with the interpretability and computational efficiency of classical learning, paving the way for intelligent, adaptive, real-time monitoring systems.
The datasets represent another essential pillar in the development and validation of AI models for leak detection. Public datasets such as Gwangju and C-TOWN enable comparability across studies and foster more generalizable models, while private or simulated datasets allow controlled experimental conditions and algorithm refinement. Together, both sources are complementary; simulations provide flexibility for training, and real-world implementations offer practical validation. Progress in this research area will depend on the availability of open, hybrid, and standardized datasets that integrate hydraulic, acoustic, thermal, and structural information, enabling the development of reliable and replicable intelligent diagnostic systems.
Finally, output types determine the functional scope of AI in leak detection. Binary models serve as effective tools for early detection and continuous monitoring, whereas multiclass schemes—based on severity or location—extend the applicability of AI to operational management and maintenance prioritization. The progression across these categories reflects a technological evolution toward full automation of hydraulic diagnostics, where systems not only detect the presence of leaks but also estimate their magnitude and location with high precision.
This work focused exclusively on water distribution networks; studies on gas pipelines were discussed only as future research perspectives due to their methodological relevance. Overall, these trends point to the convergence of integrated hybrid models capable of combining detection, classification, and localization in real time, contributing to the development of smarter, more resilient, and sustainable water infrastructures.
6. Future Perspective
Future research should focus on the creation of standardized, multimodal datasets that integrate hydraulic, acoustic, thermal, and structural information to enable more robust and generalizable AI models. Another promising direction is the incorporation of topology-aware learning frameworks, such as graph Neural Networks (GNNs), which—although not included in the present review due to scope constraints and limited availability of real-world water data—offer strong potential for improving spatial reasoning and enhancing leak localization once sensor-rich datasets become more accessible. In addition, the development of hybrid and adaptive architectures capable of operating effectively under limited labeled data will be essential, combining the interpretability of classical ML with the representational power of deep learning. Finally, advances in multi-sensor fusion and real-time analytics are expected to support field-ready diagnostic systems, contributing to more resilient, efficient, and intelligent water distribution networks.
7. Other Information
This review is registered in the Open Science Framework (OSF). The registration number and DOI will be added as soon as they are assigned. No prior protocol was prepared for this review; all methodological decisions followed PRISMA 2020 guidelines and are fully documented in the Methods section. No amendments were required because no preregistered protocol existed. All methodological decisions were defined before data extraction and are reported transparently.
Author Contributions
Conceptualization, M.Z.-U.; Methodology, M.Z.-U. and J.M.Á.-A.; Software, M.Z.-U. and R.R.-G.; validation, J.M.Á.-A., G.I.P.-S., M.A. and R.R.-G.; formal analysis, G.I.P.-S., J.M.Á.-A. and M.A.; investigation, M.Z.-U., J.M.Á.-A., M.A., R.R.-G. and V.P.-M.; resources, G.I.P.-S., J.M.Á.-A. and V.P.-M.; data curation, M.Z.-U., J.M.Á.-A. and R.R.-G.; writing original, M.Z.-U.; draft preparation M.Z.-U., R.R.-G. and J.M.Á.-A.; review and editing, M.Z.-U., J.M.Á.-A., M.A., R.R.-G., G.I.P.-S. and V.P.-M. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Data Availability Statement
No new data were created or analyzed in this study. Data sharing is not applicable to this article.
Acknowledgments
We thank CONAHCYT for their support through the Postgraduate Scholarship Program in Mexico and FONFIVE-2024 with the project “Tras la huella hídrica: Integración de técnicas de Inteligencia Artificial para la detección temprana y ubicación de fugas en redes de distribución de agua” of the Universidad Autónoma de Querétaro (2024–2026). During the preparation of this manuscript, the authors used ChatGPT v4.1 (OpenAI, San Francisco, CA, USA) exclusively for preliminary language grammar checks and then reviewed and edited as needed. No generative AI tools were used for data analysis, study design, interpretation of results, or the creation of scientific content. After using this tool, all authors reviewed and edited the content as necessary and take full responsibility for the final version of the manuscript.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| AI | Artificial intelligence |
| ML | Machine learning |
| SQL | Structured Query Language |
| MEMS | Micro-electro-mechanical system |
| PVC | Polyvinyl chloride |
| STL | Seasonal and Trend Decomposition Using Loess |
| DL | Deep learning |
| WNTR | Water Network Tool for Resilience |
| PCA | Principal component analysis |
| SVM | Support vector machine |
| VMD | Variational mode decomposition |
| RF | Random forest |
| GBT | Gradient boosted tree |
| KNN | K-nearest neighbors |
| GELM | Gaussian extreme learning machine |
| ELM | Extreme learning machine |
| CNN | Convolutional neural network |
| PSO | Particle swarm optimization |
| C-EFastICA | Complex-Efficient Fast Independent Component Analysis |
| FCNN | Fully Connected Neural Network |
| FFT | Fast Fourier transform |
| TFCNN | Time–Frequency Convolutional Neural Network |
| MLP | Multilayer perceptron |
| ANN | Artificial neural network |
| DNN | Deep neural network |
| CDPDF | Combined Dual Prediction-based Data Fusion |
| ConvLSTM | Convolutional long short-term memory |
| HF-CNN | Handcrafted features–convolutional neural network |
| LSSVM | Least squares support vector machine |
| CWT | Continuous wavelet transform |
| DBN | Deep belief network |
| VAE | Variational autoencoder |
| M-TFCNN | Multimodel Time–Frequency Convolutional Neural Network |
| DTS | Distributed temperature sensing |
| DAS | Distributed acoustic sensing |
| CBT | Categorical boosting |
| XGBoost | Extreme gradient boosting |
| VGG | Visual geometry group |
| MFCC | Mel-frequency cepstral coefficient |
| RCNN | Region-based convolutional neural network |
| FSL | Few-shot learning |
References
- FAO; UN-Water. Progress on the Level of Water Stress—Mid-Term Status of SDG Indicator 6.4.2 and Acceleration Needs, with Special Focus on Food Security—2024; FAO: Geneva, Switzerland, 2024. [Google Scholar] [CrossRef]
- United Nations. Water and Sanitation—Goal 6: Ensure Access to Water and Sanitation for All. Available online: https://www.un.org/sustainabledevelopment/es/water-and-sanitation/ (accessed on 1 July 2025).
- Connor, R.; Stoddard, H. Recognizing the centrality of water and its global dimensions. In WWAP (World Water Assessment Programme), The United Nations World Water Development Report; UN: Geneva, Switzerland, 2012; Volume 4, pp. 22–42. [Google Scholar]
- Sohaib, M.; Islam, M.; Kim, J.; Jeon, D.C.; Kim, J.M. Leakage detection of a spherical water storage tank in a chemical industry using acoustic emissions. Appl. Sci. 2019, 9, 196. [Google Scholar] [CrossRef]
- Virk, M.-U.-R.A.; Mysorewala, M.F.; Cheded, L.; Ali, I.M. Leak detection using flow-induced vibrations in pressurized wall-mounted water pipelines. IEEE Access 2020, 8, 188673–188687. [Google Scholar]
- Liu, M.; Guo, G.; Xu, Y.; Yang, Y.; Liu, N. Performance of improved Gaussian extreme learning machine for water pipeline leak recognition. IEEE Sens. J. 2024, 24, 8474–8483. [Google Scholar] [CrossRef]
- Gemeinhardt, H.; Sharma, J. Machine-learning-assisted leak detection using distributed temperature and acoustic sensors. IEEE Sens. J. 2023, 24, 1520–1531. [Google Scholar] [CrossRef]
- Sohaib, M.; Kim, J.M. Data driven leakage detection and classification of a boiler tube. Appl. Sci. 2019, 9, 2450. [Google Scholar] [CrossRef]
- Tariq, S.; Hu, Z.; Zayed, T. Micro-electromechanical systems-based technologies for leak detection and localization in water supply networks: A bibliometric and systematic review. J. Clean. Prod. 2021, 289, 125751. [Google Scholar] [CrossRef]
- Brunone, B.; Maietta, F.; Capponi, C.; Keramat, A.; Meniconi, S. A review of physical experiments for leak detection in water pipes through transient tests for addressing future research. J. Hydraul. Res. 2022, 60, 894–906. [Google Scholar] [CrossRef]
- Bui, M.T.; Yáñez-Godoy, H.; Elachachi, S.M. Assessment of the Implications and Challenges of Using Artificial Intelligence for Urban Water Networks in the Context of Climate Change When Building Future Resilient and Smart Infrastructures. J. Pipeline Syst. Eng. Pract. 2025, 16, 03124004. [Google Scholar] [CrossRef]
- Vrachimis, S.G.; Eliades, D.G.; Taormina, R.; Kapelan, Z.; Ostfeld, A.; Liu, S.; Kyriakou, M.; Pavlou, P.; Qiu, M.; Polycarpou, M.M. Battle of the leakage detection and isolation methods. J. Water Resour. Plan. Manag. 2022, 148, 04022068. [Google Scholar] [CrossRef]
- Wu, Y.; Liu, S.; Kapelan, Z. Addressing data limitations in leakage detection of water distribution systems: Data creation, data requirement reduction, and knowledge transfer. Water Res. 2024, 267, 122471. [Google Scholar] [CrossRef]
- Geelen, C.V.; Yntema, D.R.; Molenaar, J.; Keesman, K.J. Monitoring support for water distribution systems based on pressure sensor data. Water Resour. Manag. 2019, 33, 3339–3353. [Google Scholar] [CrossRef]
- Taghlabi, F.; Sour, L.; Agoumi, A. Prelocalization and leak detection in drinking water distribution networks using modeling-based algorithms: A case study for the city of Casablanca (Morocco). Drink. Water Eng. Sci. 2020, 13, 29–41. [Google Scholar] [CrossRef]
- Zhou, M.; Yang, Y.; Xu, Y.; Hu, Y.; Cai, Y.; Lin, J.; Pan, H. A pipeline leak detection and localization approach based on ensemble TL1DCNN. IEEE Access 2021, 9, 47565–47578. [Google Scholar] [CrossRef]
- Li, Z.; Wang, J.; Yan, H.; Li, S.; Tao, T.; Xin, K. Fast detection and localization of multiple leaks in water distribution network jointly driven by simulation and machine learning. J. Water Resour. Plan. Manag. 2022, 148, 05022005. [Google Scholar] [CrossRef]
- Ayati, A.H.; Haghighi, A.; Ghafouri, H.R. Machine learning approach to transient-based leak detection of pressurized pipelines: Classification vs Regression. J. Civ. Struct. Health Monit. 2022, 12, 611–628. [Google Scholar] [CrossRef]
- Fan, X.; Yu, X. An innovative machine learning based framework for water distribution network leakage detection and localization. Struct. Health Monit. 2022, 21, 1626–1644. [Google Scholar] [CrossRef]
- Marvin, G.; Grbčić, L.; Družeta, S.; Kranjčević, L. Water distribution network leak localization with histogram-based gradient boosting. J. Hydroinform. 2023, 25, 663–684. [Google Scholar] [CrossRef]
- Mazaev, G.; Weyns, M.; Moens, P.; Haest, P.J.; Vancoillie, F.; Vaes, G.; Debaenst, J.; Waroux, A.; Marlein, K.; Ongenae, F.; et al. A microservice architecture for leak localization in water distribution networks using hybrid AI. J. Hydroinform. 2023, 25, 851–866. [Google Scholar] [CrossRef]
- Piltan, F.; Kim, J.M. Leak detection and localization for pipelines using multivariable fuzzy learning backstepping. J. Intell. Fuzzy Syst. 2021, 42, 377–388. [Google Scholar] [CrossRef]
- McMillan, L.; Fayaz, J.; Varga, L. Domain-informed variational neural networks and support vector machines based leakage detection framework to augment self-healing in water distribution networks. Water Res. 2024, 249, 120983. [Google Scholar] [CrossRef]
- Bykerk, L.; Valls Miro, J. Vibro-acoustic distributed sensing for large-scale data-driven leak detection on urban distribution mains. Sensors 2022, 22, 6897. [Google Scholar] [CrossRef] [PubMed]
- Bykerk, L.; Valls Miro, J. Detection of water leaks in suburban distribution mains with lift and shift vibro-acoustic sensors. Vibration 2022, 5, 370–382. [Google Scholar] [CrossRef]
- Spandonidis, C.; Theodoropoulos, P.; Giannopoulos, F. A combined semi-supervised deep learning method for oil leak detection in pipelines using IIoT at the edge. Sensors 2022, 22, 4105. [Google Scholar] [CrossRef]
- Khalid, S.; Azad, M.M.; Kim, H.S. Real-World Steam Powerplant Boiler Tube Leakage Detection Using Hybrid Deep Learning. Mathematics 2024, 12, 3887. [Google Scholar] [CrossRef]
- Ahn, B.; Kim, J.; Choi, B. Artificial intelligence-based machine learning considering flow and temperature of the pipeline for leak early detection using acoustic emission. Eng. Fract. Mech. 2019, 210, 381–392. [Google Scholar] [CrossRef]
- Abed, M.H.; Wali, W.A.; Alaziz, M. Machine Learning Approach Based on Smart Ball COMSOL Multiphysics Simulation for Pipe Leak Detection. Iraqi J. Electr. Electron. Eng. 2023, 19, 100–110. [Google Scholar] [CrossRef]
- Xie, Y.; Gao, M.; Luo, F.; Zhou, A.; Yang, Y.; Hu, J.; Jiang, W.; Ye, Y. Django-based framework database for leakage detection using machine learning for water distribution networks. Eng. Appl. Artif. Intell. 2025, 149, 110525. [Google Scholar] [CrossRef]
- Ravichandran, T.; Gavahi, K.; Ponnambalam, K.; Burtea, V.; Mousavi, S.J. Ensemble-based machine learning approach for improved leak detection in water mains. J. Hydroinform. 2021, 23, 307–323. [Google Scholar] [CrossRef]
- Akkar, H.A.; Hadi, W.A.; Al-Dosari, I.H.; Saadi, S.M.; Ali, A.I. Classification accuracy enhancement based machine learning models and transform analysis. Commun.-Sci. Lett. Univ. Zilina 2021, 23, C44–C53. [Google Scholar] [CrossRef]
- Liu, M.; Yang, J.; Zheng, W.; Fan, E. Using novel complex-efficient FastICA blind deconvolution method for urban water pipe leak localization in the presence of branch noise. J. Water Resour. Plan. Manag. 2021, 147, 04021072. [Google Scholar] [CrossRef]
- Choi, J.; Im, S. Application of CNN models to detect and classify leakages in water pipelines using magnitude spectra of vibration sound. Appl. Sci. 2023, 13, 2845. [Google Scholar] [CrossRef]
- Basnet, L.; Brill, D.; Ranjithan, R.; Mahinthakumar, K. Supervised machine learning approaches for leak localization in water distribution systems: Impact of complexities of leak characteristics. J. Water Resour. Plan. Manag. 2023, 149, 04023032. [Google Scholar] [CrossRef]
- Kim, J.; Han, S.; Kim, D.; Lee, Y. Gas Pipeline Leak Detection by Integrating Dynamic Modeling and Machine Learning Under the Transient State. Energies 2024, 17, 5517. [Google Scholar] [CrossRef]
- Fan, X.; Zhang, X.; Yu, X. Machine learning model and strategy for fast and accurate detection of leaks in water supply network. J. Infrastruct. Preserv. Resil. 2021, 2, 10. [Google Scholar] [CrossRef]
- Sourabh, N.; Timbadiya, P.; Patel, P. Leak detection in water distribution network using machine learning techniques. Ish J. Hydraul. Eng. 2023, 29, 177–195. [Google Scholar] [CrossRef]
- Yang, L.; Zhao, Q. Combined dual-prediction based data fusion and enhanced leak detection and isolation method for WSN pipeline monitoring system. IEEE Trans. Autom. Sci. Eng. 2022, 20, 571–582. [Google Scholar] [CrossRef]
- Yuan, X.; Xu, J.; Zheng, L.; Lin, D. Hyperclustering: High-order Deep/shallow Feature Clustering for Subway Shield Tunneling Water Leakage Detection. IEEE Access 2025, 13, 2169–3536. [Google Scholar]
- Karimanzira, D. Simultaneous pipe leak detection and localization using attention-based deep learning autoencoder. Electronics 2023, 12, 4665. [Google Scholar] [CrossRef]
- Wu, Y.; Ma, X.; Guo, G.; Huang, Y.; Liu, M.; Liu, S.; Zhang, J.; Fan, J. Hybrid method for enhancing acoustic leak detection in water distribution systems: Integration of handcrafted features and deep learning approaches. Process Saf. Environ. Prot. 2023, 177, 1366–1376. [Google Scholar] [CrossRef]
- Siddique, M.F.; Ahmad, Z.; Ullah, N.; Ullah, S.; Kim, J.M. Pipeline leak detection: A comprehensive deep learning model using CWT image analysis and an optimized DBN-GA-LSSVM framework. Sensors 2024, 24, 4009. [Google Scholar] [CrossRef]
- Brahmbhatt, P.; Maheshwari, A.; Gudi, R.D. Digital twin assisted decision support system for quality regulation and leak localization task in large-scale water distribution networks. Digit. Chem. Eng. 2023, 9, 100127. [Google Scholar] [CrossRef]
- Martinez-Ríos, E.A.; Barrientos, D.; Bustamante, R. Water leakage classification with acceleration, pressure, and acoustic data: Leveraging the wavelet scattering transform, unimodal classifiers, and late fusion. IEEE Access 2024, 12, 84923–84951. [Google Scholar] [CrossRef]
- Parajuli, U.; Shin, S. Identifying failure types in cyber-physical water distribution networks using machine learning models. Aqua–Water Infrastruct. Ecosyst. Soc. 2024, 73, 504–519. [Google Scholar] [CrossRef]
- Mashhadi, N.; Shahrour, I.; Attoue, N.; El Khattabi, J.; Aljer, A. Use of machine learning for leak detection and localization in water distribution systems. Smart Cities 2021, 4, 1293–1315. [Google Scholar] [CrossRef]
- Momeni, A.; Piratla, K.R.; Chalil Madathil, K. Application of neural network–based modeling for leak localization in water mains. J. Pipeline Syst. Eng. Pract. 2022, 13, 04022032. [Google Scholar] [CrossRef]
- Mujtaba, S.M.; Lemma, T.A.; Vandrangi, S.K. Gas pipeline safety management system based on neural network. Process Saf. Prog. 2022, 41, S59–S67. [Google Scholar] [CrossRef]
- Wang, X.; Li, A.; Lin, Z.; Li, S.; Yang, Y. Natural gas transmission pipeline leak detection model based on acoustic emission and machine learning. J. Pipeline Syst. Eng. Pract. 2024, 15, 04024047. [Google Scholar] [CrossRef]
- Molina, J.L.; Patino-Alonso, C.; Wan, X.; Farmani, R. StocHastIc Early Leakage Detection System (SHIELDS) for Water Distribution Networks. Water Resour. Manag. 2025, 39, 4189–4204. [Google Scholar] [CrossRef]
- Oh, S.W.; Yoon, D.B.; Kim, G.J.; Bae, J.H.; Kim, H.S. Acoustic data condensation to enhance pipeline leak detection. Nucl. Eng. Des. 2018, 327, 198–211. [Google Scholar] [CrossRef]
- Guo, G.; Yu, X.; Liu, S.; Ma, Z.; Wu, Y.; Xu, X.; Wang, X.; Smith, K.; Wu, X. Leakage detection in water distribution systems based on time–frequency convolutional neural network. J. Water Resour. Plan. Manag. 2021, 147, 04020101. [Google Scholar] [CrossRef]
- Xu, Z.; Liu, H.; Fu, G.; Zeng, Y.; Li, Y. Feature selection of acoustic signals for leak detection in water pipelines. Tunn. Undergr. Space Technol. 2024, 152, 105945. [Google Scholar] [CrossRef]
- Ullah, N.; Ahmed, Z.; Kim, J.M. Pipeline leakage detection using acoustic emission and machine learning algorithms. Sensors 2023, 23, 3226. [Google Scholar] [CrossRef]
- Gunatilake, A.; Miro, J.V. Multimodel Neural Network for Live Classification of Water Pipe Leaks From Vibro-Acoustic Signals. IEEE Sens. J. 2024, 24, 14825–14832. [Google Scholar] [CrossRef]
- Onukwube, C.U.; Aikhuele, D.O.; Sorooshian, S. Development of a fault detection and localization model for a water distribution network. Appl. Sci. 2024, 14, 1620. [Google Scholar] [CrossRef]
- Rajasekaran, U.; Kothandaraman, M. A novel custom one-dimensional time-series DenseNet for water pipeline leak detection and localization using acousto-optic sensor. IEEE Access 2024, 12, 7966–7973. [Google Scholar] [CrossRef]
- Chandanwala, A.A.; Bhowmik, S.; Chaudhury, P.; Rajasekaran, U.; Nesam, J.J.J.; Kothandaraman, M. Water Pipeline Leakage Recognition and Localization Using Machine Learning and Deep Learning Techniques. J. Pipeline Syst. Eng. Pract. 2025, 16, 04025035. [Google Scholar] [CrossRef]
- Satterlee, N.; Zuo, X.; Lee, C.W.; Park, C.W.; Kang, J.S. Parallel multi-layer sensor fusion for pipe leak detection using multi-sensors and machine learning. Eng. Appl. Artif. Intell. 2025, 153, 110923. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |