1. Introduction
Pipeline leakage breaks have become one of the most frequent failures in water distribution networks (WDNs) due to material wear [
1], installation defects, corrosion, and vibration [
2]. Leakage breaks not only result in economic loss but also may affect public safety, such as water pollution and ground collapse [
3]. Hence, how to diagnose pipeline leakage breaks in time is a major challenge for water supply companies.
The Supervisory Control and Data Acquisition (SCADA) system has been widely used to monitor the hydraulic state of WDNs. However, the sampling frequency of the SCADA system is usually lower than 1 per min. For example, a fully automated data-driven methodology that was developed still relies on relatively low-frequency data collection, which limits its responsiveness to rapid events [
4]. Similarly, a review emphasized that many existing leak break detection systems, including those integrated with SCADA, face constraints due to low sampling rates, impacting timely leakage break detection [
5]. On the other hand, the negative pressure wave generated by a leakage break event typically lasts for only a few seconds. A transient-pressure-wave-based detection technique coupled with wavelet analysis is proposed, demonstrating the importance of capturing rapid pressure changes for accurate leakage break localization [
6]. Furthermore, Tian et al. develops a leak break detection method for low-pressure gas pipelines based on negative pressure waves and artificial intelligence, highlighting that such short-duration signals are crucial for the effective identification of leakage break events [
7]. Compared with the conventionally used 10 min averaged SCADA data, the use of high-frequency data is valuable as it leads to improved prognostic predictions. It was found that high-frequency data provides more insights into the dynamics of the condition of the wind turbine components and can aid in the earlier detection of faults [
8].
When a leakage break occurs in a WDN, the negative pressure wave generated at the leakage break point is transmitted to the monitoring point through the pipeline, and is finally documented in a combination of instantaneous pressure drop and pressure increase in the form of high-frequency pressure data. Therefore, the information of the negative pressure wave of the leakage break can be regarded as a singular point in a continuous signal [
6]. Signal-processing methods, such as Fast Fourier Transform (FFT) [
9,
10], have difficulty in processing non-stationary signals, and Short-Time Fourier Transform (STFT) has a limited performance of leakage break diagnosis due to the fixed window [
11]. While FFT and STFT remain the primary tools for analyzing stationary signals, WT is more suitable for the multiresolution analysis of non-stationary signals. To address the complex non-stationary parts of signals, the wavelet transform (WT) was introduced in this research [
12]. A continuous wavelet transform (CWT) and discrete wavelet transform (DWT) can be used to extract different leakage break features, and appropriate thresholds are set for judging singular points [
13].
However, it is difficult to set a reliable threshold and judge whether a leakage break event occurs in the WDNs for both signal-processing methods and statistical analysis methods. Empirical thresholds are set based on the analyzer’s specific understanding of the given WDNs, which may lead to a false positive or false negative for a leakage break diagnosis [
14]. In contrast, with the ability of automatic feature extraction and parameter adjustment, machine learning methods can effectively reduce the influence of subjective judgment. The classifier is automatically trained by learning the hydraulic information in the samples that are extracted from leakage break features. After that, new samples’ labels can be predicted with the trained classifier [
15].
Feature selection plays a critical role in machine learning model performance, where leakage break feature selection has emerged as a significant research focus in recent years. Xiao et al. used the Relief-F algorithm to evaluate the quality of the extracted leakage break features and selected the optimal features as the input of the SVM classifier [
16]. The final result shows that the selected leakage break feature detection through the Relief-F algorithm achieved a high performance on the SVM classifier. The selection of classification algorithms represents another crucial factor affecting leakage break diagnosis accuracy. Tijani et al. extracted 17 leakage break features and classified them with four classifiers [
3]. Their analysis revealed the superior performance of Artificial Neural Networks (ANNs) compared with alternative classifiers within their dataset. However, an SVM achieved the best performance in another test [
17]. The study above showed that although the same feature extraction method was used, the leakage break information could be different in different network environments. The selection of leakage break features and classifiers of a shallow neural network also need subjective and empirical judgments from an analyzer, potentially limiting their adaptability in actual WDNs.
On the other hand, the high-frequency data collected in the WDN is in the form of a time series. Methods based on feature extraction also cause the loss of a large amount of information in the original data, including useful information for leakage break diagnosis in the time dimension [
18]. With the development of deep neural networks, such as the recurrent neural network (RNN) and convolutional neural network (CNN), it is possible to extract the features of time-series data automatically by the neural network instead of manual setting. With these deep neural networks, researchers no longer have to compare and choose leakage break features for different WDNs. As an improvement of CNN, Fully Convolutional Networks (FCNs) replace the fully connected layer with a global average pooling layer and have been proven to achieve greater performances in the classification of time-series data [
19,
20]. Long Short-Term Memory (LSTM) is a specific type of RNN. Lee and Yoo used an LSTM model to learn the fluctuation information in the low-frequency flow data monitored by SCADA system and predict the flow change in the next period [
14]. Finally, the deviation between the monitored flow and the predicted flow is calculated to judge whether leakage break occurs in the WDN. Zuo et al. used a modified LSTM model to learn and extract the inherent features of the monitored data from the SCADA system and then input them into a one-class SVM classifier to diagnose leakage break events [
21]. LSTM, the FCN, and other neural network methods are capable of extracting specific information in time-series data. However, although methods based on these deep neural networks can extract leakage break features from SCADA system adaptively, there’s a lack of analyzing and comparing the leakage break features extracted from high-frequency data. Meanwhile, whether the leakage break diagnosis performance can be improved by concatenating the leakage break features extracted by different neural networks needs to be further explored.
Building upon previous research and addressing the limitations of manual feature selection’s subjectivity and adaptability constraints, this study aimed at developing a novel leakage break diagnosis method for WDNs. Given the features extracted by an RNN and CNN based on their respective principles are different, and the features extracted by them are complementary in some time-series data classification tasks, this paper proposes a concatenating deep neural network, specifically an LSTM-FCN model, to complete the classification task of time-series data. First, the original high-frequency pressure data is preprocessed through filtering and frequency reduction. Second, the LSTM module and FCN module are used to learn the time information and local information in the data, respectively, and complete the extraction of leakage break features. Finally, the extracted leakage break features are sent to a classifier to output the predicted label, which includes the diagnosis information of leakage break detection and leakage break degree prediction. An experimental network is established to simulate leakage break events and leakage break-free events. A steady-state sample set and a variable water-consumption sample set were obtained to verify the performance of the proposed method. Meanwhile, the influence of several factors on the method performance is also discussed, such as the sample length and leakage break time instant. The experimental results validate the methodology’s reliability and diagnostic accuracy in the test WDN environment, where the LSTM-FCN model exhibited consistent performances across diverse operational conditions.
Based on the above analysis, the motivation of this study was to overcome the limitations of manual feature extraction methods, which often suffer from subjectivity and poor adaptability, by developing an automated and robust leakage break diagnosis approach for WDNs.
Traditional methods for detecting water losses in WDNs often involve techniques such as the division into District Metered Areas (DMAs), flow balancing, loss indicator determination, and the analysis of minimum night flow (MNF). These methods provide valuable insights into system efficiency and help identify potential leakage breaks. The DMA method divides the network into manageable sections for more precise monitoring, while flow balancing compares inflows and outflows to assess losses. The MNF analysis method, based on the observation of water usage during off-peak hours, helps detect anomalies indicative of leaks. However, these conventional techniques face limitations in addressing the complexity and dynamic nature of WDNs, especially under varying operational conditions and water-consumption patterns. In contrast, the proposed LSTM-FCN-based method in this study leverages high-frequency pressure data to provide a more adaptive and accurate approach to leakage break diagnosis, addressing these challenges with improved reliability and predictive power.
Therefore, the objective of this work was to propose a novel deep learning-based method that leverages high-frequency pressure data to detect and classify leakage break events with a high reliability and accuracy. Specifically, a concatenated deep neural network architecture, the LSTM-FCN model, was designed to exploit the complementary strengths of RNNs and CNNs in time-series feature extraction.
In this paper, we refer to the detection of newly occurred leakage events as “leak break” detection, rather than traditional “leak detection.” The term “leak detection” may refer to long-term background leakage. And compared with sudden burst events, the leakage flow in this research was relatively low. In many previous studies, particularly those using transient test-based methods, the focus was on identifying pre-existing leaks that persisted over time. These conventional techniques, such as inverse transient analysis (ITA), analyze the reflection of pressure waves caused by long-term leakages under controlled conditions. To avoid confusion, we clarify that our focus was on identifying pressure transients caused by newly generated leaks in a water distribution network.
To clarify the scope of this study, we emphasize that our goal was not to locate pre-existing leaks, but to detect the emergence of new leakage break events in real-time by analyzing pressure transients caused by their sudden appearance.
Transient wave-based methods are widely used for leak detection because they capture high-frequency pressure variations caused by structural disturbances. Among them, inverse transient analysis (ITA) is a common and effective diagnostic technique.
Capponi et al. proposed the Network Admittance Matrix Method (NAMM), which formulates water hammer equations in the frequency domain and uses a Laplacian matrix for leak detection. Validated on a branched system, it showed accurate leak localization under varying conditions [
22]. Wang and Ghidaoui introduced a linearized transient model with maximum likelihood estimation to detect multiple leaks, demonstrating a high accuracy in viscoelastic pipes [
23]. Tong-Chuan et al. reviewed five types of transient wave-based methods, highlighting their strengths but noting their reliance on the prior knowledge of pipeline topology and complex signal processing, which may limit adaptability in unknown systems [
24]. A wider range of studies have demonstrated the potential of pressure transient tests in locating such persistent leaks. Classical methods using pressure wave analysis and physical modeling have been validated in labs and field tests. Studies like those cited by Brunone et al. examined how the leak location, pipe material, and boundary conditions affect the transient responses [
25].
While these approaches are effective for detecting passive leak signals in controlled settings, they rely heavily on detailed system models and repeated transient tests. This limits their use in large or poorly documented networks, where rapid deployment is needed.
Unlike conventional transient-wave-based methods, our approach does not rely on previous knowledge of the pipe network’s topology. Instead, we applied a data-driven method that learns from high-frequency pressure signals to distinguish normal consumption from anomalies caused by leakage breaks.
Furthermore, most of these methods are mainly used for regular checks or planned tests on old pipelines, not for the quick detection of new leak breaks as soon as they happen. In contrast, our study focused on the detection of newly occurred leak breaks, which generate transient signals different from those of long-term leaks. By adopting a data-driven deep learning method, we can quickly identify leak-induced transient patterns from normal water consumptions directly from the data, making it suitable for real-time applications in complex, dynamic distribution systems.
The novelties of this study are threefold: (1) the application of the LSTM-FCN model to automatically learn both temporal and local features from high-frequency pressure data for leakage break diagnosis; (2) the establishment of an experimental water distribution network to generate steady-state and variable consumption sample sets, providing a comprehensive evaluation of the method under realistic conditions; and (3) a detailed investigation of the impact of the sample length and leakage break occurrence timing on the diagnostic performance, offering practical insights for future field applications.
Although background leakage is commonly present in water distribution networks, the detection of sudden leakage breaks remains valuable for operation and maintenance, it still holds practical value in specific scenarios—such as industrial cooling systems or sensor-equipped pipeline sectors—where operational anomalies must be promptly analyzed. In such cases, high-frequency pressure monitoring is feasible and meaningful. Therefore, this study did not aim to propose a universal solution, but rather a targeted framework for systems where leak break detection remains relevant.
4. Conclusions
A leakage break diagnosis method based on the LSTM-FCN neural network model from high-frequency pressure data is proposed in this paper. First, data preprocessing is used to avoid the influence of noise and information redundancy. Second, the LSTM module and the FCN module are used to extract and combine different leakage break features. Finally, the leakage break feature is sent to a dense classifier to obtain the predicted result. The following conclusions were drawn:
The LSTM-FCN deep learning model was well qualified for the three-classification task of leakage break diagnosis and had a high accuracy in the leakage break detection and leakage break degree prediction on the high-frequency pressure data. The LSTM model and FCN model could extract different leakage break features. And the LSTM-FCN deep learning model obtained a better performance of leakage break diagnosis because of the combination of the two different leakage break features. The trained LSTM-FCN deep learning model showed a good performance on both the steady-state sample set and water-consumption sample set, which illustrated that the model could be highly robust to the variations in water consumption in the actual WDNs.
To ensure the effectiveness of the proposed LSTM-FCN-based leakage break diagnosis method, certain conditions must be met. The approach requires high-frequency pressure data with sufficient temporal resolution to capture transient features related to leakage break events. The pipeline system should have a relatively stable hydraulic background, as excessive environmental noise or frequent operational fluctuations may mask the leakage break-induced patterns. Additionally, accurate sensor calibration and stable sensor placement are essential for reliable data acquisition. The method may be less effective in systems with sparse sensor coverage, a low sampling frequency, or highly irregular pressure variations caused by external disturbances.
This study confirmed the feasibility of using an LSTM-FCN model for classifying leakage break severity levels in a laboratory-based water distribution system. The experimental results demonstrate that the model was sensitive to different leakage break conditions, which supports its potential as a diagnostic tool in more complex leakage break scenarios. Although a leakage break always exists in real systems, the “normal condition” in our setup was used as a baseline to enhance model sensitivity to newly emerging leakage breaks. The ability to differentiate the leakage break severity can assist utilities in better management and also forms the basis for further tasks, including leakage break quantification and spatial localization. Future research will focus on reducing the reliance on labeled data by exploring semi-supervised and transfer learning techniques. Future work may also incorporate leak localization modules following the detection stage, enabling the system to not only identify the occurrence of a leak but also estimate its position. Combining classification-based detection with inverse transient analysis or data-driven localization algorithms could enhance the practical value of the proposed framework in real-world WDNs.
Although the experiments were conducted under laboratory conditions using pipes in a controlled environment, the proposed leakage break diagnosis method can be adapted for real-world WDNs. What is more, the model was also tested on unlabeled data under variable consumption patterns, indicating its capacity to generalize across unseen usage conditions. However, practical application in underground pipelines may face challenges, such as environmental noise and pressure fluctuations. Further field validation on real-world WDNs is needed to improve the model’s generalization ability. And the influence of the sensor number and spatial density on the leakage break localization accuracy will be further studied. The optimal deployment strategy for large-scale networks remains an open research question.