Towards a Realistic Data-Driven Leak Localization in Water Distribution Networks

Ajoodani, Arvin; Nazif, Sara; Ramazi, Pouria

doi:10.3390/w17131988

Open AccessArticle

Towards a Realistic Data-Driven Leak Localization in Water Distribution Networks

by

Arvin Ajoodani

¹

,

Sara Nazif

¹

and

Pouria Ramazi

^2,*

¹

School of Civil Engineering, College of Engineering, University of Tehran, Tehran 1417614411, Iran

²

Department of Mathematics and Statistics, Brock University, St. Catharines, ON L2S 3A1, Canada

^*

Author to whom correspondence should be addressed.

Water 2025, 17(13), 1988; https://doi.org/10.3390/w17131988

Submission received: 30 April 2025 / Revised: 28 June 2025 / Accepted: 29 June 2025 / Published: 2 July 2025

(This article belongs to the Special Issue Sustainable Management of Water Distribution Systems)

Download

Browse Figures

Versions Notes

Abstract

Current data-driven methods for leak localization (LL) in water distribution networks (WDNs) rely on two unrealistic assumptions: they frame LL as a node-classification task, requiring leak examples for every node—which rarely exists in practice—and they validate models using random data splits, ignoring the temporal structure inherent in hydraulic time-series data. To address these limitations, we propose a temporal, regression-based alternative that directly predicts the leak coordinates, training exclusively on past observations and evaluating performance strictly on future data. By comparing five machine-learning techniques—k-nearest neighbors, linear regression, decision trees, support vector machines, and multilayer perceptrons—in both classification and regression modes, and using both random and temporal splits, we show that conventional evaluation methods can misleadingly inflate model accuracy by up to four-fold. Our results highlight the importance and suitability of a temporally consistent, regression-based approach for realistic and reliable leak localization in WDNs.

Keywords:

leak localization; water distribution network; machine learning; classifier; regressor

1. Introduction

Water distribution networks (WDNs) are critical infrastructure systems that ensure reliable access to potable water. However, water loss due to undetected leaks continues to pose significant challenges globally, leading to economic loss, infrastructure damage, and resource waste [1]. Leak localization (LL), identifying the location of leaks within a WDN, is therefore an essential task for improving operational efficiency and sustainability. Current LL methods are mostly based on field inspection and using acoustic devices, which are slow and inefficient [2].

To overcome the disadvantages of physical methods, model-based, data-driven, and hybrid models are used to localize the leakage using hydraulic and topological data of WDNs. Model-based methods require a well-calibrated hydraulic model of the Water WDN, which addresses the LL problem using mechanistic approaches, such as inverse problem-solving methods [3], sensitivity analysis [4], and fuzzy logic techniques [5]. In contrast, data-driven methods do not rely on a hydraulic model of the network; instead, they only require hydraulic data, either real or simulated. Some studies employ a combination of data-driven and model-based methods, referred to as hybrid approaches. This means that certain phases of the method depend on a hydraulic model, while other steps can be executed without it [6]. Data-driven models have attracted more attention in the last few years as they require a less profound understanding of the complex nonlinear behavior of WDNs [6].

Most data-driven models used for LL of WDNs are supervised machine learning (ML) models [7], such as random forests (RFs) and Bayesian classifiers. Inputs of these models are hydraulic features of the network, the most common one of which is residual pressure, which represents the amount of pressure drop in nodes due to the leakage. The output of these models is the location of the leakage, represented by the label of the leakage node [1] or nodes close to the leakage point [8]. The dataset required for tuning the models can come from field inspections, which are both time-consuming and costly, or, more commonly, from hydraulic simulations in software such as EPANET 2 [9]. In theory, one could simply generate synthetic leak events at every node (and even inject random noise) to guarantee full nodal coverage. However, achieving representative synthetic data relies on a fully calibrated hydraulic model (accurate pipe roughness, demand patterns, boundary conditions, etc.), which itself demands high-quality field measurements and past leak records. Moreover, adding uncorrelated random noise does not reproduce the complex, correlated uncertainties in real sensor readings, demand-driven pressure variations, valve operations, and other network dynamics. As a result, purely noise-augmented simulations may still fail to capture the nuances of real-world leakage signatures.

The collected dataset is then divided into calibration and testing datasets for developing ML models. Calibration, also called training in ML, is the process of changing the learnable parameters to reduce the differences between the actual values of the target variable and the model predictions [10]. Testing is the process of evaluating the model’s performance over a different dataset from the training [11]. The generation of the training and testing datasets, although synthetic and simulation-based, should closely mimic real-world constraints.

High performances, up to 100%, were reported for data-driven models developed for the LL of WDNs. However, there are concerns regarding the development of these models that make the reported performances questionable. First of all, often a classifier was used for LL, that is, a model that classifies the leakage node as one of the predefined nodes (labels). This, however, requires the model to be trained with a training dataset that includes, for every node, a scenario where that is the leakage node. In a real-world application where the dataset consists of leakage records, this means that all network nodes should have leaked at least once in the past. Since regressor models do not require all possible outputs among their training samples, training a regressor to predict the location (coordinates) of the leaking node can be a more realistic approach.

A second issue is the random partitioning of the dataset into training and testing datasets, because, in practice, the model has only access to past data of the network, not the future [12,13]. Whereas, in a random partitioning, some instances in the testing dataset proceed to those in the training, resembling the unrealistic situation where the model can access future data when calibrated. The realistic approach is a temporal partitioning, where a time point is considered as a reference (e.g., the current time), and the data instances before and after that time point comprise the training and testing datasets, respectively (Figure 1). By taking the reference time point at the end of one of the leakage scenarios, none of the leakage scenarios are both before and after the reference point. Then, temporal partitioning will be the same as nodal partitioning, where a specific percentage of the network nodes leak in the training phase and the rest leak in both the training and testing phases. The network has relatively similar hydraulic conditions during the leakage of the same node at different time steps. It is realistic to have data instances of some nodes at different time steps in both training and testing datasets because nodes that leaked before could leak in the future, but a random partitioning often puts data instances of all nodes in both training and testing datasets. This similarity boosts the performance, which is misleading and not necessarily replicable in real conditions. It is important to recognize that considering a high percentage of nodes during the training phase is not realistic. This is because it is rare for all nodes to have leakage records and for their data to be readily available. Therefore, only a limited number of nodes should be used in the training phase.

The most common data-driven approach found in previous studies on LL of WDNs involves training a classifier to predict the location of a leak, whether it be at a node, pipe, or area, using hydraulic data from the network. These studies vary in three key aspects: the type of hydraulic data utilized as input, which can include pressure, residual pressure, or flow; the classifiers employed for training; and the specific case studies examined. For example, one of the recent works that employed this general approach [14] trained three classifiers, support vector machine (SVM), k-nearest neighbor (KNN), and artificial neural network (ANN), to predict the label of the leaking node in a campus-scale WDN, using residual pressures as the input. The SVM classifier achieved the highest overall accuracy at 79%. This was followed by KNN, which achieved an accuracy of 70%, and ANN, which achieved 61%. The other works, which used almost the same method, are summarized in Table 1.

There were also some studies that used different approaches from those listed in Table 1. Ref. [23] trained a neural network regressor to predict the coordination of the leaking node, using pressure at some nodes, on a WDN in Portugal. The model was able to predict the coordinates of the leaking node with a coefficient of determination (R²) of 0.98. In [24], an image processing technique was employed to locate the leaking node in the benchmark Hanoi network. Each node in the water network was subjected to different demand pattern scenarios with leaks, and residual pressure was recorded at 12 observation nodes. For each leak scenario, an RGB image was generated in which the value of each pixel represented the residual pressure at that location. The residual pressure at pixels corresponding to observation nodes was directly recorded, while the pressure at other pixels was estimated using spatial Kriging interpolation based on the network’s topological information and the observed pressures. A convolutional neural network (CNN) model was trained to identify the pixel corresponding to the leak location. The trained model was able to identify the leaking node with an accuracy of 94%.

The primary and common research gap among the discussed works is the reliance on a full-label dataset for training a classifier and the use of random partitioning, which results in an unrealistic train/test split.

To the best of the authors’ knowledge, no existing data-driven model has specifically addressed LL in WDNs using training data from only a limited number of nodes. In practical scenarios, generating a comprehensive dataset covering leakages at every network node is often unrealistic due to limitations in historical records and the cost and practicality of inducing artificial leaks. Most real-world leakage data come from historical records or controlled tests (e.g., opening fire hydrants) at selected locations, making it improbable to have data for every node. Therefore, it is critical that simulated datasets used for training closely mimic these practical constraints by not assuming leakage data from all nodes.

This paper addresses two primary objectives: First, it proposes a more realistic and practical alternative approach for leak localization. Second, it demonstrates that unrealistic modeling practices, such as training classifiers on datasets that include leakage scenarios from every node or using random partitioning for training and testing datasets, can result in overly optimistic and misleading performance metrics that are unlikely to be reproducible under real-world conditions.

To achieve these objectives, multiple machine-learning models, including classifiers (KNNs, Linear Regression (LR), DT, SVM, and Multilayer Perceptron (MLP)) and their corresponding regressors, are trained using both random and temporal (nodal) partitioning methods on two synthetic benchmark networks. The classifiers predict labels corresponding to the leaking nodes, while regressors predict the coordinates of the leakage locations.

The contribution of this work is two-fold: first, explicitly illustrating how common unrealistic practices, such as training classifiers with complete nodal leakage data and random dataset partitioning, lead to inflated, unrealistic performance metrics; second, proposing and validating a practical alternative using ML regressors trained via temporal (nodal) partitioning to estimate leakage coordinates, effectively addressing the challenge of incomplete node leakage data.

The main audience for this study includes researchers focused on data-driven LL in WDNs. However, the findings concerning dataset partitioning approaches (random vs. temporal) may also be beneficial for researchers developing data-driven models involving time-dependent datasets in other domains.

2. Methodology

Figure 2 illustrates the detailed methodology proposed in this research. In steps 1 through 8, the required samples were generated, with each sample corresponding to a specific leakage scenario at a particular node (n) within a given time interval, set equal to the length of the demand pattern. The number of time steps depends on the demand pattern—for instance, a 24-h demand pattern contains 24 time steps. To capture fluctuations in water demand during leakage scenarios, the base demands of nodes were multiplied by a random coefficient (d), creating variations among samples for identical nodes leaking at the same time step. To maintain realistic leakage scenarios, the leakage flow was constrained not to exceed 30% of the total network flow at any given time step. If this threshold was surpassed, the coefficient d was adjusted, and the process was repeated until the condition was satisfied.

In steps 9 and 10, the generated dataset was divided into training and testing datasets using two different methods: random partitioning and nodal (temporal) partitioning. The following step involved using a grid search approach to evaluate the model’s training effectiveness. In step 12, we checked whether the model’s performance had improved by more than 1% compared to the previous iteration, which had a smaller number of samples. If the improvement exceeded 1%, the number of samples was increased; otherwise, the process of expanding the dataset was stopped.

2.1. Benchmark Networks

Two benchmark WDNs are considered to test the proposed methodology in this study. The details and characteristics of these WDNs are explained in this section.

(a): Hanoi network

The Hanoi WDN, a common benchmark in LL, was used as the first case study (Figure 3). This network was introduced first in [25] and includes 31 internal nodes, one reservoir, and 34 pipes.

The pipes’ diameters were considered based on the values suggested by [26] (Table A1 in Appendix A). The suggested demand pattern for each node of the network by [19] was used as the network’s base demand patterns (Table A2 and Figure A1 in Appendix A).

(b): Anytown WDN

The Anytown WDN [27], which includes 22 nodes, was used as the second case study (Figure 4). The chart of the network’s 24-h flow pattern is shown in Table A3 of Appendix A.

2.2. Data Simulation in EPANET

EPANET 2.2 [9] was used to simulate networks in various leakage conditions. Each sample was made by leaking just one node at each timestep. Using engineering judgment, it was assumed that each leakage continued on average for 24 h, so each node was leaked in 24 different timesteps. For simulating changes in the nodal demands over time, creating samples that are more distinct from one another, and increasing the number of samples for better model training, additional demand patterns were made by multiplying the 24-h demand pattern coefficients by a random number between 0 and 2.

In EPANET, the leakage is simulated by assuming an emitter installed at the leaking node. The following experimental formula is used for measuring the output flow of an emitter [20]:

Q = C P^{0.5}

(1)

where Q, C, and P are the leaking flow (m³/s), emitter coefficient, and pressure head (m) at the leaking node. The emitter coefficient represents the size and form of the emitter’s nozzle.

To simulate leaking in a node, its emitter coefficient value should be determined. Theoretically, the emitter coefficient can have any positive value, but for simplification, it was assumed that all nodes experienced leakage due to physical damage with the same shape and size, so all leakages were set to have C = 1. The size of the leaking flows was up to 3% of the network’s total flow.

2.3. Feature and Target Engineering

The input features for all models were the residual pressures of all nodes. In real networks, only a subset of nodes has pressure sensors. Since determining the optimal sensor placements is beyond the scope of this work, the pressures of all nodes were used as the input. The output was the coordination of the leaking node for regressor ML models and the leaking node label for classifier ML models.

2.4. Dataset Partitioning Strategy

Two types of datasets were generated. For the nodal (temporal) partitioning, 20% of the network nodes were chosen randomly to leak in various demand conditions to generate the training dataset. Then, all nodes leaked the same way to generate the testing dataset. For the random partitioning, all nodes leaked, and then samples were divided randomly to generate training and testing datasets. As the common practice for random partitioning, 80% of the dataset was taken as the training dataset, and the remaining 20% was used for testing.

2.5. ML Model Selection

Scikit-learn, an ML library in Python 3.9, was used to train KNN, LR, LoR, DT, SVM, and MLP models. The grid-search algorithm was used for the hyperparameter optimization of these ML models. In this approach, the model is trained with various combinations of hyperparameters, and the combination that has the best performance over the training dataset is used as the model configuration. The main hyperparameters of each ML were chosen according to [29] (Table A4 in Appendix A). Since the highest accuracy among all models was approximately 90%, no evaluation for overfitting was performed [30].

2.6. Training

The number of training samples used for the Hanoi WDN started from 144 (20% of 31 nodes, which equals 6 nodes that leaked in 24 h) and increased by a constant step of 144 (data for each extra day of leakage). This process was repeated until none of the metrics used for performance evaluation of the model improved by more than 1% (the performance became almost constant). The same approach was used for the Anytown network, in which the training dataset size started from 96 samples (4 nodes that leaked in 24 h) and increased with a constant step of 96. For the nodal partitioning, a total of 12 and 9 different training sample sizes were utilized for the Hanoi and Anytown networks, respectively. The maximum number of training samples reached 8928 for the Hanoi network and 4752 for the Anytown network in order to determine the optimal size of the training dataset. For random partitioning, the number of training samples used for the Hanoi WDN started from 595 (80% of the total samples, which were made by leaking 31 nodes in 24 h) and increased by a constant step of 595. For the Anytown network, we followed the same procedure, starting with 423 training samples and increasing the dataset in fixed increments of 423. For the nodal partitioning, a total of 20 and 15 different training sample sizes were utilized for the Hanoi and Anytown networks, respectively. The maximum number of training samples reached 11,900 for the Hanoi network and 6345 for the Anytown network in order to determine the optimal size of the training dataset. In random partitioning, data from every node appears in both training and testing sets; by contrast, nodal (temporal) partitioning ensures that only a subset of nodes contributes samples to both datasets.

The 5-fold cross-validation was used as a common method [31] to ensure that the performance of the models is not biased by a specific dataset partitioning. In the case of random partitioning, the dataset was divided into 5 parts. One part was selected for testing, while the other four parts were used for training the model. This process was repeated five times so that each part served as the testing dataset once. For nodal partitioning, the Hanoi and Anytown networks were divided into 5 clusters. Four clusters contained 6 nodes, and one cluster contained 7 nodes, while three clusters in the Anytown network had 4 nodes, and two clusters had 5 nodes. The clustering was performed randomly, as there is no specific pattern for the nodes based on historical records of leakage. One cluster was used to generate the training dataset, and this process was repeated until all clusters had been used once to create the training dataset. Finally, the performance of the model was evaluated under each partitioning condition.

2.7. Evaluation Metrics

The classifiers’ output, the leaking node label, could be used directly to evaluate the model’s performance. However, the regressors’ output needed pre-processing before the evaluation. The predicted coordinates rarely fit any node of the network, so the nearest node to the predicted coordinates was considered the model-predicted leaking node. Three indices, accuracy, ATD, and average ranking (AR), were used. Accuracy is defined as:

A c c u r a c y = \frac{c}{s} \times 100

(2)

where s is the number of samples in the testing dataset and c is the number of samples that were correctly predicted in the testing dataset. The most commonly used metric for evaluating models developed for multi-label classification tasks [32]. Accuracy ranges from 0% to 100%, with higher values indicating better model performance.

However, since overall accuracy provides only a general assessment, we also defined a more detailed performance metric—Accuracy_i—to capture how well the model performs for each individual label.

A c c u r a c y_{i} = \frac{c_{i}}{s}

(3)

where c_i represents the number of samples in which the leaking node is among the i-nearest nodes to the predicted node. While accuracy simply measures how many samples were localized correctly versus how many were not, this new index classifies the testing samples into n categories, where n denotes the number of nodes in the network. As i increases, it is expected that accuracy_i will also increase, because locating the leaking node among a larger number of nearest nodes becomes an easier task. Despite Accuracy, this new metric does not respond uniformly to all the cases where the model fails to identify the leakage node. Instead, it highlights instances where the predicted node is nearer to the leakage node.

ATD is formulated as follows [16]:

A T D = \frac{\sum_{j = 1}^{s} d_{j}}{s}

(4)

where

d_{j}

is defined as the shortest path on the pipelines between the real leakage node and the predicted leakage node by the model for the j_th sample in the testing dataset, and

s

denotes the number of samples in the testing dataset. The ATD is a real number that ranges from 0 to the longest path between two network nodes. A lower value indicates better model performance.

The AR indicator is defined to examine how close the real leaking is to the model prediction compared to other nodes of the network, and it is formulated as follows:

A v e r a g e r a n k i n g = \frac{\sum_{j = 1}^{s} r_{j}}{s}

(5)

To calculate the Average Ranking (AR), all network nodes are first sorted based on their distance from the node predicted by the model. For each test sample, the rank r_j represents the position of the true leaking node within this sorted list, i.e., how close the actual leak is to the model’s prediction. The AR metric is then computed as the average of these ranks across all s samples in the testing dataset. The AR value ranges from 1 (best possible performance, where the true leak is always the closest node to the prediction) to N, the total number of nodes (worst case, where the true leak is always the farthest). In this study, the maximum value of this index is 31 for the Hanoi network and 22 for the Anytown network. A lower AR indicates better localization accuracy. Figure 5 provides a visual example to illustrate this concept.

3. Results

All models were trained and evaluated under five different data partitioning conditions, and the average performance according to the ATD and AR metrics on the test datasets is illustrated in Figure 6. The results of the grid search for all models, under the condition with the best performance (among the five conditions of the 5-fold cross-validation method), are shown in Table A5 in the Appendix A.

3.1. Comparing Various Models Based on ATD and AR Metrics

For both the Hanoi and Anytown water-distribution networks (WDNs), the highest accuracy was obtained when the classifier was trained with random partitioning. In descending order of performance, the next-best approaches were (i) regressors with random partitioning, (ii) regressors with nodal partitioning, and (iii) classifiers with nodal partitioning.

Random partitioning allows the training set to contain leakage samples from every node, which—because leaks simulated at the same node are far more alike than leaks at different nodes—gives the model information it would never see in practice. This “peeking” explains the consistently superior scores of models trained in this way. Within that group, classifiers outperformed regressors because classification maps the input features to a small, discrete set of outputs, enabling the model to learn the feature-to-label relationships more readily.

By contrast, nodal (temporal) partitioning with classifiers performed worst. Only 20% of the nodes (those present in the training subset) were represented during learning, so the model could localize leaks on those nodes but was effectively blind to the remaining 80%. Regressors trained with the same nodal split fared somewhat better: although they also saw only a subset of nodes, they predict continuous coordinates rather than discrete labels, so the pressure-to-location relationship learned from the known nodes could be partially transferred to unseen nodes.

Despite these differences, all classifiers trained with random partitioning still located the true leak very close to their predictions. According to the AR index, every classifier except the decision tree (DT) identified the actual leakage node within its four nearest-neighbor predictions in both WDNs. Using the ATD index, their predicted leak positions were, on average, within roughly 1 km in Hanoi and 4 km in Anytown—impressive given that the total pipeline lengths are 40 km and 90 km, respectively.

3.2. Comparing Results of Models on Train and Test Datasets

The numerical results for all models on both the training and test datasets are presented in Table 2 and Table 3. The analysis of all models, except for classifiers that used nodal partitioning, indicated that the training dataset achieved better (lower) AR and ATD. Moreover, overfitting did not occur for these models, as the performance on the training dataset is not significantly better than the performance on the test datasets [30].

Classifiers trained with nodal partitioning tell a different story. They performed markedly better on the training set than on the test set, producing a large apparent generalization gap. This gap should not be read as conventional overfitting. Under nodal partitioning, the model is exposed to leakage labels for only 20% of the nodes and is entirely ignorant of the remaining 80%. In routine practice such a setup would be deemed infeasible, but we included it to complete the factorial comparison of (i) classifier vs. regressor and (ii) random vs. nodal splits. Given that most classes are absent during training, a steep drop in test-set performance is inevitable and simply reflects the information withheld from the model rather than a failure of the learning algorithm itself.

3.3. Comparing Various KNN Models Based on the Accuracy_i Index

All algorithms had the same behavior according to Accuracy_i index. As an instance, the KNN models’ performance is shown in Figure 7. As the value of i increases, the difference between different KNN models becomes less discernible. The probability of the real leakage node being among the i-nearest nodes to the predicted leaking node increases as the value of i increases, despite the model development approach.

The findings from models employing random partitioning clearly highlight the limitations of this widely used practice. Random partitioning, by design, places very similar samples, representing leakages from the same node under various demand conditions, into both the training and testing datasets. Consequently, models can artificially achieve high accuracy since they have already encountered nearly identical scenarios during training. This explains the consistently high performance reported in earlier studies, regardless of the choice of machine learning algorithm or network complexity. However, this apparent success is misleading, as real-world conditions rarely guarantee prior exposure to all nodes or scenarios during model training, severely limiting the practical applicability of such classifiers.

3.4. Assessing Error Distribution of the Models

Figure 8 plots the probability distributions of normalized ATD for the DT models trained on the Hanoi and Anytown networks; analogous curves for the remaining models are provided in Figure A2, Figure A3, Figure A4 and Figure A5 of Appendix A.

For classifiers with random partitioning, the distributions are narrow and tightly clustered around the center, indicating highly consistent performance across all samples. This behavior is expected because the test instances strongly resemble the training set, enabling the models to pinpoint the corresponding leakage nodes with ease.

When classifiers are trained with nodal (temporal) partitioning, the distributions become left-skewed, with a pronounced peak at lower ATD values. This peak reflects superior performance on leakages originating from nodes that were present during training—these cases are inherently easier to localize.

The regressors exhibit similar qualitative patterns. Under random partitioning, their distributions also center on a mean but show a wider spread than the classifiers, a consequence of the more demanding task: regressors must predict two continuous outputs (x- and y-coordinates) rather than a single categorical label, and their output space is effectively unbounded.

With nodal (temporal) partitioning, the regressor curves again display a left-side peak corresponding to training-set nodes. Compared with the classifiers, two distinctions emerge: (i) the classifier peak lies closer to zero, consistent with the relative simplicity of the classification task, and (ii) the regressor distributions assign lower probability to very large errors (values approaching 1), indicating that even with limited nodal data, the regression models still capture underlying spatial patterns.

3.5. Analysis of the Spatial Performance of the Models

In models that used random partitioning, every node in the training dataset had corresponding samples, resulting in similar accuracy across all nodes. In contrast, with nodal partitioning, only some nodes contained samples from the training dataset, leading to varying performance levels across the different nodes. To illustrate this concept, the performance of the MLP regressor (in one of the five conditions evaluated using the 5-fold cross-validation method), based on the ATD metric, at each node of the Hanoi network is shown in Figure 9.

Although the overall ATD value for the network was 2473 m, the ATD for individual nodes varied, ranging from 1863 m to 2907 m. Nodes that were generally farther away from those included in the training dataset had higher ATD values, indicating that the model performed poorly in locating these nodes while they were leaking.

4. Discussion and Conclusions

In this paper, we addressed the unrealistic assumptions underlying many previous data-driven approaches for LL in WDNs. Traditionally these approaches assumed complete leakage data availability for all nodes, used random train-test partitioning, and mostly relied on classifiers to identify the leaking node based on hydraulic data. To introduce greater realism and applicability, an alternative approach of nodal (temporal) partitioning was proposed and validated in this study. Unlike random partitioning, nodal partitioning uses leakage data from only a subset of nodes during training, while testing incorporates data from all nodes. Additionally, regression models were emphasized that directly predict leak coordinates, which is inherently more applicable to practical scenarios.

The conducted comparative analysis demonstrated that random partitioning inflated model performance significantly, up to four times higher according to accuracy measures, when compared to more realistic nodal partitioning. This indicates clearly that random partitioning methods yield overly optimistic and unrealistic evaluations, reinforcing our argument for adopting nodal partitioning and regression methods to obtain more dependable, real-world applicable results.

Nevertheless, the present study has limitations, including the assumption of a single leakage at each timestep, the use of a constant emitter coefficient, and reliance on pressure data from all nodes as inputs. Future research should focus on overcoming these limitations by: (1) evaluating model performance under simultaneous (overlapping) leak scenarios, (2) optimizing sensor placements to enhance performance and applicability, and (3) repeating the process of model development and comparison of them suggested in this work for large-scale real-life networks to gain more confidence in the validity of the reported results. While the evaluation methodologies of some previous studies may be questioned, their parameter-tuning techniques and methodological insights remain valuable and should be revisited within this more realistic framework. Finally, for practitioners and decision-makers, it is essential to ensure that the LL models they adopt are evaluated using realistic assumptions. They must scrutinize the dataset generation process, verify that their models can generalize to unseen nodes, and avoid overreliance on inflated metrics derived from flawed validation strategies.

Author Contributions

Conceptualization, S.N. and P.R.; Methodology, A.A.; Software, A.A.; Validation, S.N. and P.R.; Investigation, A.A.; Data curation, A.A.; Writing—original draft, A.A., S.N. and P.R.; Supervision, S.N. and P.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data available on request due to privacy restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. The diameter values of the Hanoi WDN’s pipes.

Label	Diameter (mm)
1	1016
2	1016
3	1016
4	1016
5	1016
6	1016
7	1016
8	1016
9	1016
10	762
11	609.6
12	609.6
13	508
14	406.4
15	304.8
16	304.8
17	406.4
18	508
19	508
20	1016
21	508
22	304.8
23	1016
24	762
25	762
26	508
27	304.8
28	304.8
29	406.4
30	406.4
31	304.8
32	304.8
33	406.4
34	1016

Figure A1. Base demand multipliers for different demand patterns [19].

Table A2. Nodes’ base demands and their demand patterns for the Hanoi WDN.

Label	Base Demand (CMH)	Demand Pattern
1 (Reservoir)	N.A.	N.A.
2	890	1
3	850	1
4	130	1
5	725	1
6	1005	1
7	1350	1
8	550	1
9	525	1
10	525	1
11	500	4
12	560	4
13	940	4
14	615	1
15	280	1
16	310	5
17	865	5
18	1345	5
19	60	5
20	1275	2
21	930	4
22	485	4
23	1045	6
24	820	6
25	170	6
26	900	2
27	370	2
28	290	3
29	360	3
30	360	3
31	105	3
32	805	3

Table A3. 24-h demand patterns of Anytown nodes.

Hour	Demand Multipliers
0	1
1	1
2	1
3	0.9
4	0.9
5	0.9
6	0.7
7	0.7
8	0.7
9	0.6
10	0.6
11	0.6
12	1.2
13	1.2
14	1.2
15	1.3
16	1.3
17	1.3
18	1.2
19	1.2
20	1.2
21	1.1
22	1.1
23	1.1

Table A4. The list of all models’ hyperparameters used for the grid-search. Names and values are based on the Scikit-learn version 1.4.0.

Model	Hyperparameters and Their Options
KNN	n-neighbors: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
DT	criterion: {squared_error, friedman_mse, absolute_error, poisson} (for regressor)
	criterion: {gini, entropy, log_loss} (for classifier)
	max_depth: {None, 1, 2, 5, 10, 20}
	min_samples_split: {2, 5, 10, 20}
	min_sample_leaf: {1, 2, 5, 10, 20}
	max_features: {1, 2, 5, 10, 20, 50, 100}
LR	None
LoR (as an equivalent classifier for LR)	penalty: {None, L1, L2, elastic net}
	C: {1, 2, 5, 10, 20, 50, 100}
	solver: {liblinear, newton-cg, lbfgs, sag, saga}
MLP	hidden_layer_sizes: {(10), (20), (30), (40), (50), (60), (70), (80), (90), (100)}
	activation: {relu, tanh, logistic}
	solver: {lbfgs, sgd, adam}
	learning_rate: {constant, adaptive}
	learning_rate_init: {0.01, 0.01, 0.001, 0.0001}
SVM	C: {10,000, 20,000, 50,000, 100,000}
	kernel: {linear, rbf, poly}
	degree: {2, 3} (for kernel: rbf)
	epsilon: {0.01, 0.02, 0.05, 0.1} (for regression}

Table A5. The best hyperparameters for all models. The optimal configuration for each model is presented for the best dataset size.

Model	Results on Hanoi WDN	Results on Anytown WDN
KNN regressor with temporal partitioning	n-neighbors: 1	n-neighbors: 1
KNN regressor with random partitioning	n-neighbors: 2	n-neighbors: 2
KNN classifier with temporal partitioning	n-neighbors: 1	n-neighbors: 1
KNN classifier with random partitioning	n-neighbors: 2	n-neighbors: 3
DT regressor with temporal partitioning	criterion: absolute_error	criterion: poisson
	max_depth: 10	max_depth: 20
	max_features: 20	max_features: 10
	min_sample_leaf: 1	min_sample_leaf: 2
	min_samples_split: 2	min_samples_split: 5
DT regressor with random partitioning	criterion: friedman_mse	criterion: absolute_error
	max_depth: None	max_depth: 20
	max_features: 50	max_features: 10
	min_sample_leaf: 2	min_sample_leaf: 1
	min_samples_split: 5	min_samples_split: 2
DT classifier with temporal partitioning	criterion: gini	criterion: log_loss
	max_depth: 5	max_depth: 10
	max_features: 100	max_features: 20
	min_sample_leaf: 2	min_sample_leaf: 1
	min_samples_split: 2	min_samples_split: 2
DT classifier with random partitioning	criterion: gini	criterion: gini
	max_depth: None	max_depth: None
	max_features: 50	max_features: 20
	min_sample_leaf: 1	min_sample_leaf: 1
	min_samples_split: 2	min_samples_split: 2
LR regressor with temporal partitioning	None	None
LR regressor with random partitioning	None	None
LoR classifier with temporal partitioning	C: 20	C: 100
	penalty: L1	penalty: L1
	solver: liblinear	solver: liblinear
LoR classifier with random partitioning	C: 50	C: 100
	penalty: l1	penalty: l1
	solver: liblinear	solver: liblinear
MLP regressor with temporal partitioning	activation: logistic	activation: logistic
	hidden_layer_sizes: (70)	hidden_layer_sizes: (90)
	learning_rate: adaptive	learning_rate: adaptive
	learning_rate_init: 0.001	learning_rate_init: 0.01
	solver: lbfgs	solver: lbfgs
MLP regressor with random partitioning	activation: relu	activation: relu
	hidden_layer_sizes: (30)	hidden_layer_sizes: (30)
	learning_rate: constant	learning_rate: constant
	learning_rate_init: 0.0001	learning_rate_init: 0.01
	solver: lbfgs	solver: lbfgs
MLP classifier with temporal partitioning	activation: relu	activation: logistic
	hidden_layer_sizes: (10)	hidden_layer_sizes: (50)
	learning_rate: constant	learning_rate: constant
	learning_rate_init: 0.01	learning_rate_init: 0.01
	solver: adam	solver: lbfgs
MLP classifier with random partitioning	activation: logistic	activation: relu
	hidden_layer_sizes: (80)	hidden_layer_sizes: (30)
	learning_rate: constant	learning_rate: constant
	learning_rate_init: 0.0001	learning_rate_init: 0.01
	solver: lbfgs	solver: lbfgs
SVM regressor with temporal partitioning	C: 20,000	C: 100,000
	epsilon: 10	epsilon: 100
	kernel: rbf	kernel: rbf
	degree: N.A.	degree: N.A.
SVM regressor with random partitioning	C: 100,000	C: 50,000
	epsilon: 100	epsilon: 100
	kernel: rbf	kernel: rbf
	degree: N.A.	degree: N.A.
SVM classifier with temporal partitioning	C: 10,000	C: 100,000
	kernel: poly	kernel: poly
	degree: 2	degree: 2
SVM classifier with random partitioning	C: 10,000	C: 100,000
	kernel: poly	kernel: poly
	degree: 2	degree: 2

Figure A2. The distribution of normalized ATD for various KNN models trained on Hanoi and Anytown networks.

Figure A3. The distribution of normalized ATD for various LR models trained on Hanoi and Anytown networks.

Figure A4. The distribution of normalized ATD for various MLP models trained on Hanoi and Anytown networks.

Figure A5. The distribution of normalized ATD for various SVM models trained on Hanoi and Anytown networks.

References

Sun, C.; Parellada, B.; Puig, V.; Cembrano, G. Leak localization in water distribution networks using pressure and data-driven classifier approach. Water 2020, 12, 54. [Google Scholar] [CrossRef]
Fares, A.; Tijani, I.A.; Rui, Z.; Zayed, T. Leak detection in real water distribution networks based on acoustic emission and machine learning. Environ. Technol. 2023, 44, 3850–3866. [Google Scholar] [CrossRef] [PubMed]
Daniel, I.; Pesantez, J.; Letzgus, S.; Khaksar Fasaee, M.A.; Alghamdi, F.; Berglund, E.; Mahinthakumar, G.; Cominola, A. A Sequential Pressure-Based Algorithm for Data-Driven Leakage Identification and Model-Based Localization in Water Distribution Networks. J. Water Resour. Plan. Manag. 2022, 148, 04022025. [Google Scholar] [CrossRef]
Steffelbauer, D.B.; Deuerlein, J.; Gilbert, D.; Abraham, E.; Piller, O. Pressure-Leak Duality for Leak Detection and Localization in Water Distribution Systems. J. Water Resour. Plan. Manag. 2022, 148, 04021106. [Google Scholar] [CrossRef]
Sanz, G.; Perez, R.; Escobet, A. Leakage localization in water networks using fuzzy logic. In Proceedings of the 2012 20th Mediterranean Conference on Control & Automation (MED), Barcelona, Spain, 3–6 July 2012; pp. 646–651. [Google Scholar] [CrossRef]
Romero-Ben, L.; Alves, D.; Blesa, J.; Cembrano, G.; Puig, V.; Duviella, E. Leak detection and localization in water distribution networks: Review and perspective. Annu. Rev. Control 2023, 55, 392–419. [Google Scholar] [CrossRef]
Burkart, N.; Huber, M.F. A Survey on the Explainability of Supervised Machine Learning. J. Artif. Intell. Res. 2021, 70, 245–317. [Google Scholar] [CrossRef]
Soldevila, A.; Boracchi, G.; Roveri, M.; Tornil-Sin, S.; Puig, V. Leak detection and localization in water distribution networks by combining expert knowledge and data-driven models. Neural Comput. Appl. 2022, 34, 4759–4779. [Google Scholar] [CrossRef]
Rossman, L.A. EPANET 2 USERS MANUAL. 2000. Available online: https://www.microimages.com/documentation/tutorials/epanet2usermanual.pdf (accessed on 29 April 2025).
Pernot, P. Calibration in Machine Learning Uncertainty Quantification: Beyond consistency to target adaptivity. APL Mach. Learn. 2023, 1, 046121. [Google Scholar] [CrossRef]
Braiek, H.B.; Khomh, F. On testing machine learning programs. J. Syst. Softw. 2020, 164, 110542. [Google Scholar] [CrossRef]
Ramazi, P.; Haratian, A.; Meghdadi, M.; Mari Oriyad, A.; Lewis, M.A.; Maleki, Z.; Vega, R.; Wang, H.; Wishart, D.S.; Greiner, R. Accurate long-range forecasting of COVID-19 mortality in the USA. Sci. Rep. 2021, 11, 13822. [Google Scholar] [CrossRef]
Ramazi, P.; Kunegel-Lion, M.; Greiner, R.; Lewis, M.A. Predicting insect outbreaks using machine learning: A mountain pine beetle case study. Ecol. Evol. 2021, 11, 13014–13028. [Google Scholar] [CrossRef] [PubMed]
Sousa, C.; Calheiros, C.; Maria, A.; Geraldes, A.; Onukwube, C.U.; Aikhuele, D.O.; Sorooshian, S. Development of a Fault Detection and Localization Model for a Water Distribution Network. Appl. Sci. 2024, 14, 1620. [Google Scholar] [CrossRef]
Mazaev, G.; Weyns, M.; Vancoillie, F.; Vaes, G.; Ongenae, F.; Van Hoecke, S. Probabilistic leak localization in water distribution networks using a hybrid data-driven and model-based approach. Water Supply 2023, 23, 162–178. [Google Scholar] [CrossRef]
Tyagi, V.; Pandey, P.; Jain, S.; Ramachandran, P. A Two-Stage Model for Data-Driven Leakage Detection and Localization in Water Distribution Networks. Water 2023, 15, 2710. [Google Scholar] [CrossRef]
Mazaev, G.; Weyns, M.; Moens, P.; Haest, P.J.; Vancoillie, F.; Vaes, G.; Debaenst, J.; Waroux, A.; Marlein, K.; Ongenae, F.; et al. A microservice architecture for leak localization in water distribution networks using hybrid AI. J. Hydroinformatics 2023, 25, 851–866. [Google Scholar] [CrossRef]
Li, J.; Zheng, W.; Lu, C. An Accurate Leakage Localization Method for Water Supply Network Based on Deep Learning Network. Water Resour. Manag. 2022, 36, 2309–2325. [Google Scholar] [CrossRef]
Lučin, I.; Lučin, B.; Čarija, Z.; Sikirica, A. Data-driven leak localization in urban water distribution networks using big data for random forest classifier. Mathematics 2021, 9, 672. [Google Scholar] [CrossRef]
Mashhadi, N.; Shahrour, I.; Attoue, N.; El Khattabi, J.; Aljer, A. Use of machine learning for leak detection and localization in water distribution systems. Smart Cities 2021, 4, 1293–1315. [Google Scholar] [CrossRef]
Soldevila, A.; Blesa, J.; Fernandez-Canti, R.M.; Tornil-Sin, S.; Puig, V. Data-driven approach for leak localization in water distribution networks using pressure sensors and spatial interpolation. Water 2019, 11, 1500. [Google Scholar] [CrossRef]
Zhou, X.; Tang, Z.; Xu, W.; Meng, F.; Chu, X.; Xin, K.; Fu, G. Deep learning identifies accurate burst locations in water distribution networks. Water Res. 2019, 166, 115058. [Google Scholar] [CrossRef]
Capelo, M.; Brentan, B.; Monteiro, L.; Covas, D. Near–real time burst location and sizing in water distribution systems using artificial neural networks. Water 2021, 13, 1841. [Google Scholar] [CrossRef]
Javadiha, M.; Blesa, J.; Soldevila, A.; Puig, V. Leak localization in water distribution networks using deep learning. In Proceedings of the 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, 23–26 April 2019. [Google Scholar] [CrossRef]
Fujiwara, O.; Khang, D.B. A two-phase decomposition method for optimal design of looped water distribution networks. Water Resour. Res. 1990, 26, 539–549. [Google Scholar] [CrossRef]
Geem, Z.W. Optimal cost design of water distribution networks using harmony search. Eng. Optim. 2006, 38, 259–277. [Google Scholar] [CrossRef]
Walski, T.M.; Brill, E.D.; Gessler, J.; Goulter, I.C.; Jeppson, R.M.; Lansey, K.; Lee, H.; Liebman, J.C.; Mays, L.; Morgan, D.R.; et al. Battle of the Network Models: Epilogue. J. Water Resour. Plan. Manag. 1987, 113, 191–203. [Google Scholar] [CrossRef]
Xu, J.; Wang, H.; Rao, J.; Wang, J. Zone scheduling optimization of pumps in water distribution networks with deep reinforcement learning and knowledge-assisted learning. Soft Comput. 2021, 25, 14757–14767. [Google Scholar] [CrossRef]
Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
Halabaku, E.; Bytyçi, E. Overfitting in Machine Learning: A Comparative Analysis of Decision Trees and Random Forests. Intell. Autom. Soft Comput. 2024, 39, 987–1006. [Google Scholar] [CrossRef]
Wong, T.T.; Yeh, P.Y. Reliable Accuracy Estimates from k-Fold Cross Validation. IEEE Trans. Knowl. Data Eng. 2020, 32, 1586–1594. [Google Scholar] [CrossRef]
Pereira, R.B.; Plastino, A.; Zadrozny, B.; Merschmann, L.H.C. Correlation analysis of performance measures for multi-label classification. Inf. Process. Manag. 2018, 54, 359–369. [Google Scholar] [CrossRef]

Figure 1. (a) Temporal and (b) random partitioning of a dataset into training and testing datasets. The demand pattern is a 24-h interval of a WDN with four nodes. The nodes are numbered from 1 to 4, with node 2 in yellow representing the observing node. The duration of all leakages is four hours, and data are collected in one-hour time steps. In the temporal partitioning, the 8th hour is the reference cutting point.

Figure 2. The detailed methodology diagram.

Figure 3. The Hanoi WDN [19].

Figure 4. The Anytown WDN [28].

Figure 5. Concept of (a) accuracy_i and (b) ATD and AR on an example leakage sample. The metrics accuracy₁ and accuracy₂ are equal to 0 because the model’s prediction (node 20) is not among the two nearest nodes to the leaking node (node 23). However, accuracy₃ is equal to 1 because node 20 is within the three nearest nodes to the leaking node. It is clear that accuracy₄, accuracy₅, and the following metrics are all equal to 1. Nodes are ranked according to their distance from the leaking node, with node 20 receiving a rank of 3, which corresponds to the concept of AR. The shortest path between the leaking node and the model’s prediction is 2650 m in length, illustrating the concept of ATD.

Figure 6. The performance of all models trained on both case studies according to ATD and AR metrics. Blue columns represent regressors with nodal partitioning, while red columns represent regressors with random partitioning. Green columns represent classifiers with nodal partitioning, and purple columns represent classifiers with random partitioning.

Figure 7. The evaluation of different KNN models on the Hanoi network according to accuracy_i. It is expected from the model to show a better performance as i (the number of nearest neighbors in which the presence of the leaking node was assessed) increases, because it should perform an easier task, but according to the figure, just models with nodal partitioning had an improving performance while i was increasing. Models with random partitioning showed almost no sensitivity to this parameter and had a high performance for all values of i. This unexpected behavior of the models with random partitioning shows that the promising performance of these models is not related to their understanding of the problem but is the consequence of having samples with high similarity in both the training and testing datasets.

Figure 8. The distribution of normalized ATD for various DT models trained on Hanoi and Anytown networks.

Figure 9. The performance of the MLP regressor at each node in the Hanoi network is presented. Nodes whose corresponding samples appeared in the training phase are shown, and all nodes are color-coded based on their ATD values.

Table 1. Summary of works with a similar approach to [13].

Work	The Input	The Output	Trained Classifiers	Case Studies	Results
[15]	pressure at nodes	label of the leaking node	logistic regression (LoR)	a district-metered area in Belgium	Average topological distance (ATD) = 0.18 to 4.96 km
[16]	flow of some pipes and pressure at some nodes	label of the leaking node	LoR	Hanoi	Accuracy = 91%
				Net3	Accuracy = 79%
				C-Town	Accuracy = 30%
[17]	pressure at nodes	label of the leaking node	elastic-net LoR	a district-metered area in Belgium	ATD = 0.17 to 1.2 km
[18]	pressure at some nodes	label of the leaking pipe	ResNet	Anytown	Accuracy = 94%
[18]	pressure at some nodes	label of the leaking pipe	ResNet	Net3	Accuracy = 91%
[8]	flow of pipes	label of the leaking area	KNN	Barcelona WDN	Accuracy = 80%
[19]	residual pressures at some node	label of the leaking node	Random forest (RF)	Hanoi	Accuracy = 100%
[20]	flow of some pipes and residual pressure at some nodes	label of the leaking node	LoR	Lille University network	Accuracy = 100%
[21]	residual pressure at nodes	label of the leaking node	KNN	Hanoi	ATD = 2.3 nodes
[22]	pressure at some nodes	label of the leaking pipe	ANN	Anytown	Accuracy = 100%

Table 2. The performance of all models that trained on Hanoi and Anytown networks based on the ATD (m) metric. Bold numbers indicating lower values (better performance) in each row.

Model Name	Model Type	Partitioning Type	Hanoi Train	Hanoi Test	Anytown Train	Anytown Test
KNN	Regressor	Nodal	1895	1951	8306	8553
	Regressor	Random	1154	1180	5338	5461
	Classifier	Nodal	810	2239	3587	9919
	Classifier	Random	272	277	2043	2087
DT	Regressor	Nodal	2266	2367	10,693	11,168
	Regressor	Random	1872	1970	6201	6526
	Classifier	Nodal	1011	2905	4259	12,241
	Classifier	Random	1086	1121	4121	4255
LR	Regressor	Nodal	2597	2649	11,425	11,654
	Regressor	Random	2173	2212	9178	9341
	Classifier	Nodal	2012	4481	7423	16,528
	Classifier	Random	86	89	1958	2005
MLP	Regressor	Nodal	1988	2029	10,092	10,297
	Regressor	Random	1929	1939	9445	9493
	Classifier	Nodal	1170	2473	5173	10,930
	Classifier	Random	138	139	1353	1355
SVM	Regressor	Nodal	2778	2846	9326	9555
	Regressor	Random	2245	2311	7300	7517
	Classifier	Nodal	923	2874	3908	12,174
	Classifier	Random	167	171	1009	1033

Table 3. The performance of all models that trained on Hanoi and Anytown networks based on the AR metric. Bold numbers indicating lower values (better performance) in each row.

Model Name	Model Type	Partitioning Type	Hanoi Train	Hanoi Test	Anytown Train	Anytown Test
KNN	Regressor	Nodal	12.33	12.70	8.02	8.26
	Regressor	Random	4.03	4.12	3.47	3.55
	Classifier	Nodal	4.70	12.99	3.34	9.23
	Classifier	Random	3.35	3.42	3.06	3.13
DT	Regressor	Nodal	8.48	8.86	9.16	9.57
	Regressor	Random	7.71	8.12	6.74	7.09
	Classifier	Nodal	4.29	12.33	3.48	10.01
	Classifier	Random	7.17	7.40	5.35	5.53
LR	Regressor	Nodal	11.49	11.73	8.81	8.99
	Regressor	Random	2.88	2.93	5.07	5.16
	Classifier	Nodal	5.31	11.82	4.26	9.49
	Classifier	Random	1.13	1.15	4.53	4.64
MLP	Regressor	Nodal	10.88	11.10	8.05	8.22
	Regressor	Random	3.53	3.55	5.90	5.93
	Classifier	Nodal	5.67	11.99	4.13	8.73
	Classifier	Random	2.46	2.47	2.96	2.97
SVM	Regressor	Nodal	12.03	12.33	9.03	9.25
	Regressor	Random	5.71	5.88	4.70	4.84
	Classifier	Nodal	4.43	13.81	3.15	9.83
	Classifier	Random	1.68	1.72	2.94	3.01

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ajoodani, A.; Nazif, S.; Ramazi, P. Towards a Realistic Data-Driven Leak Localization in Water Distribution Networks. Water 2025, 17, 1988. https://doi.org/10.3390/w17131988

AMA Style

Ajoodani A, Nazif S, Ramazi P. Towards a Realistic Data-Driven Leak Localization in Water Distribution Networks. Water. 2025; 17(13):1988. https://doi.org/10.3390/w17131988

Chicago/Turabian Style

Ajoodani, Arvin, Sara Nazif, and Pouria Ramazi. 2025. "Towards a Realistic Data-Driven Leak Localization in Water Distribution Networks" Water 17, no. 13: 1988. https://doi.org/10.3390/w17131988

APA Style

Ajoodani, A., Nazif, S., & Ramazi, P. (2025). Towards a Realistic Data-Driven Leak Localization in Water Distribution Networks. Water, 17(13), 1988. https://doi.org/10.3390/w17131988

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards a Realistic Data-Driven Leak Localization in Water Distribution Networks

Abstract

1. Introduction