Development of Leakage Detection Model and Its Application for Water Distribution Networks Using RNN-LSTM

: With the advent of the 4th Industrial Revolution, advanced measurement infrastructure and utilization technologies are being noticeably introduced into the water supply system to store and utilize measurement data. From this perspective, the leak detection technology in water supply networks is becoming increasingly vital to sustainable water resource management and the clean water supply worldwide. In particular, leakage detection of buried pipelines is rated as a very challenging research topic given the current level of technology. However, leakage in buried underground pipelines is rated as a very challenging research topic given the current level of technology. Therefore, a data-driven leak detection model was developed through this study using deep learning technology based on inﬂow meter data. Multiple threshold-based models were applied to reduce the RNN-LSTM (Recurrent Neural Networks–Long Short-Term Memory models) deep learning and false prediction range, which is programmed in conjunction with the Python language and Google Colaboratory (a big data analysis tool). The developed model consists of ﬂow pattern shape extraction, RNN-LSTM-based ﬂow prediction, and threshold setting modules. The developed model was applied to the actual leakage accident data, followed by the performance evaluation. As a result, the leak was recognized at most points immediately after the accident. The performance of leak detection was evaluated by a Confusion matrix and showed more than 90% accuracy at all points except singularities. Therefore, the developed model can be used as a critical software technology to proactively identify various at present with smart water infrastructure being introduced. In addition, this model is highly scalable as it can consider various operational situations based on the expert system, and it can also efﬁciently reﬂect the results of pipe network analysis across different scenarios.


Introduction
Water distribution networks (WDN), one of the social infrastructure facilities, are installed and operated underground in many large spaces to distribute, transport, and supply purified water from a water source to the faucet of each customer. Long-buried pipelines in water distribution networks may cause various abnormal situations in the system, such as water loss through large and small-scale continuous leaks (reduction of water flow rate), water quality problems by scale or corrosion accumulated in the pipelines, and poor water outflow due to narrowed flow area (low water pressure). In particular, the leakage rate of waterworks nationwide is 10.8% (about 720 million m 3 ). When converted into economic losses, the loss due to tap water leakage is about KRW 658 billion based on the annual production cost. Therefore, it is essential to efficiently maintain and manage the water distribution network system to detect and repair leaks immediately to remodel or replace old pipelines as a preventive measure.
The leak detection technology in water supply networks is increasingly vital to sustainable water resource management and clean water supplies worldwide. In particular, leakage detection in buried pipelines is rated as a very challenging research topic given the current level of technology [1]. In the event of leaks due to an accident or sudden 2 of 15 pipe rupture due to aging, the operator starts with leak detection, checks the location of the damage, closes the gate valves, opens the drain valves (drainage), and proceeds with recovery work in the order of restoration and water flow. In such a process, if there are no duplicated pipelines or annexed supply systems, the cut-off time that directly affects consumers lasts at least 12 h [2] to several weeks or more for a metropolitan water supply system. Closing gate valves, opening drain valves, and restoring and flowing water are primary tasks that require the least amount of time. Ultimately, reducing the time required to detect and confirm a leak is essential for successful leak accident management. In general, a reliable leak detection system should detect leak accidents accurately and quickly and locate the precise leak points. In addition, such a system should be equipped to minimize false alarms in advance when detecting and locating the precise leak points.
In this study, to determine the applicability of the data-based deep learning technique to shorten the leak detection time, we intend to build and evaluate a leak detection model through statistical analysis between the predicted future flow rate and the measured value based on past data. We installed the following three specific goals to evaluate the model applicability. First, we developed a data-based leak detection model using deep learning technology based on the inflow meter data of water distribution network systems. Then, the RNN-LSTM deep learning methodology and multi-threshold model were built to reduce the range of false measurements by programming, with Python language, a big data analysis tool in conjunction with Google Colaboratory. Second, we calculated and quantified the confusion matrix for performance evaluation in the deep learning field by developing and applying the model based on the inflow data of water distribution networks to evaluate the performance of the leak detection model according to the various leaks at different periods. Third, we applied and evaluated the models developed for leak accidents in Korea and presented measures to improve and complement the deep learning-based leak detection models developed based on the evaluation results.

Related Study
Leak detection methods in the water distribution network system can be divided into hardware-based and model-based techniques [3] (Figure 1). Model-based techniques can be further divided into hydraulic model-based and hydraulic measurement-based methods. The hardware-based techniques used in the operation and management fields include sound and gas tracking methods, which involve a significant amount of time and labor. In addition, since this study aims to detect leaks caused by pipe breakage that occurs instantaneously in an unexpected environment for a short period, hardware-based detection has limitations to reduce the leak detection time by labor. Although CCTVs may be installed in the pipeline to monitor water leakage, these devices cannot be installed in all pipelines and generate relatively high maintenance costs such as data transmission according to the operation of such equipment. The leak detection models detect leaks through the hydraulic analysis simulation based on the hydraulic analysis in the water distribution network system, divided into a steady flow analysis and an unsteady flow analysis. The former does not consider the temporal change of hydraulic characteristics and vice versa. A typical hydraulic analysis program for steady flow analysis is EPANET2 [4], which identifies leaks in the water distribution network system through changes in emitter coefficients and corrected demands before and after leak detection. The unsteady flow analysis typically calibrates the friction factor and the amount of leakage to match the actual measurement data and the hydraulic analysis data in determining the presence and location of leaks through the results. The hydraulic-based method requires corrected input data as close to the real system for accurate leak detection. Therefore, it is necessary to regularly reconstruct the fundamental input data for pipe network analysis (pipeline information, demand fluctuations, etc.) for the water distribution network system. In addition, since the continuous measurement is required to obtain accurate pipeline network analysis results, a professional understanding of the variability of the system's hydraulic information, demand fluctuations, etc.) for the water distribution network system. In addition, since the continuous measurement is required to obtain accurate pipeline network analysis results, a professional understanding of the variability of the system's hydraulic properties according to field conditions is required for hydraulic analysis-based real-time leak detection. Recently, due to the improvement of the storage capacity of measurement data and the development of various applicable algorithms, the application of measurement data is gradually expanding in the water supply field [2]. For leak detection due to pipe breakage, statistical analysis methods of time series data measured from flow meters and artificial intelligence (AI) methods based on big data are also applied [3,4]. Recently, leak recognition and detection methods have made exponential progress, driven by the enhanced data storage capacity and data-based prediction technology due to the recent development of Industry 4.0. The automated data analysis (ADA) system, developed in the UK in 2010, has enough AI (Artificial Intelligence) technology data due to the expansion of the DMA (District Metering Area) area and can now predict the flow rate change due to leakage relatively accurately. The ADA system is a flow prediction and leak detection model based on an artificial neural network (ANN). As such, data-based leak detection methods due to pipe breakage are mainly used nowadays.
Ref. [5] presented data-based pipe breakage detection methods to detect the effects of pipe breakage through classification-based, prediction-classification-based, and statistical methods with a focus on whether or not predictions are included. The classificationbased method begins by constructing a classification model to distinguish the leak data from the data under normal conditions. Classification, a primary step in data analysis, distinguishes leaking and normal situations and is built by analyzing the input measurement data of the pipe network system. Accuracy is verified based on the built model to determine whether there is a leak when actual measurement data is input, similar to other deep learning techniques. Moreover, Ref. [6] applied an artificial neural network (ANN) to determine water leaks due to pipe breakage in a water distribution network system. The prediction-classification-based method is performed in the order of classification and prediction. When categorizing and finding a pipe breakage with categorical properties, data trends for the next point are predicted by constructing a model for continuous properties. In general, unlike the classification model, the leak detection model uses only standard hydraulic data for the prediction model. The authors of [7] have detected leaks using a linear Kalman filter (LKF), which provides a statistical direction for the current state of a dynamic system by considering all historical data. A LKF trained on normal data provides an estimate at each step. In addition, the filter is a relatively efficient method when Recently, due to the improvement of the storage capacity of measurement data and the development of various applicable algorithms, the application of measurement data is gradually expanding in the water supply field [2]. For leak detection due to pipe breakage, statistical analysis methods of time series data measured from flow meters and artificial intelligence (AI) methods based on big data are also applied [3,4]. Recently, leak recognition and detection methods have made exponential progress, driven by the enhanced data storage capacity and data-based prediction technology due to the recent development of Industry 4.0. The automated data analysis (ADA) system, developed in the UK in 2010, has enough AI (Artificial Intelligence) technology data due to the expansion of the DMA (District Metering Area) area and can now predict the flow rate change due to leakage relatively accurately. The ADA system is a flow prediction and leak detection model based on an artificial neural network (ANN). As such, data-based leak detection methods due to pipe breakage are mainly used nowadays.
Ref. [5] presented data-based pipe breakage detection methods to detect the effects of pipe breakage through classification-based, prediction-classification-based, and statistical methods with a focus on whether or not predictions are included. The classification-based method begins by constructing a classification model to distinguish the leak data from the data under normal conditions. Classification, a primary step in data analysis, distinguishes leaking and normal situations and is built by analyzing the input measurement data of the pipe network system. Accuracy is verified based on the built model to determine whether there is a leak when actual measurement data is input, similar to other deep learning techniques. Moreover, Ref. [6] applied an artificial neural network (ANN) to determine water leaks due to pipe breakage in a water distribution network system. The predictionclassification-based method is performed in the order of classification and prediction. When categorizing and finding a pipe breakage with categorical properties, data trends for the next point are predicted by constructing a model for continuous properties. In general, unlike the classification model, the leak detection model uses only standard hydraulic data for the prediction model. The authors of [7] have detected leaks using a linear Kalman filter (LKF), which provides a statistical direction for the current state of a dynamic system by considering all historical data. A LKF trained on normal data provides an estimate at each step. In addition, the filter is a relatively efficient method when providing a converged result using the training data because the method requires only the current data to predict. Because normal data is used to predict system fluctuations, if a leak occurs in the pipe network, the predicted value will be significantly different from the instrument's Sustainability 2021, 13, 9262 4 of 15 actual observed value. In general, attempts have been made to predict a normal hydraulic situation in several ways, using a relatively simple method of calculating the absolute difference between the predicted and observed values.
Ref. [8] conducted a pipe breakage prediction study using a support vector machine (SVM). The SVM, a type of machine learning, is a supervised learning model for pattern recognition and data analysis and is mainly used for classification and regression analysis. When a set of data belonging to either of the two categories is given, the SVM algorithm creates a non-stochastic binary linear classification model to categorize the new data based on the given data set. The created classification model is expressed as a boundary in the space where data is mapped, and the SVM algorithm finds the boundary with the largest width.
ANN is widely used both in classification models and in the prediction stage [9-12]. The ANN result is a predicted value of flow or pressure instead of a value that varies from 0 to 1, while the prediction result does not provide information on classification. Furthermore, ref. [13] statistically set the threshold value in the existing prediction-classification-based leak detection model. In the case of a threshold, it is essential to estimate an abnormal value for the flow rate in water distribution networks. However, in the leak detection model of operating water distribution networks, the threshold value is set according to an operator's experience, which frequently causes miscalculations.
Recently, more advanced DNN algorithms in ANN have been in the spotlight. The models applied here include RNN, CNN, and LSTM. However, there is a limitation that the shape of the flow data cannot be extracted due to the Gradient vanishing in the case of RNN. Although CNNs have the advantage of utilizing a small number of parameters by short-term data to show fast results, there are still problems such as Gradient vanishing. In the case of LSTM, one of the deep learning models to solve this Gradient vanishing problem has the advantage of being able to remember and use various features from the past as weights. In conclusion, the hardware-based techniques need to be calibrated against the limitations of time and labor, and the hydraulic-model-based techniques need to be constantly calibrated against actual values. Therefore, it is deemed appropriate for continuous and short-term detection to be the hydraulic-measurement-based methods based on the statistical analysis of the actual values. In addition, among the hydraulic-measurement-based methods, prediction-classification-based models in which the prediction and classification are performed simultaneously are judged to produce more reliable results. Figure 2 shows the operation procedure of the leak detection model used in this study. For leak detection, flow data was first obtained through the meter. The water consumption (flow) in the water distribution network system shows a periodicity of continuous variability by hours, days, weeks, and seasons. Daily time series have been analyzed to consider these time-series data [14][15][16]. Furthermore, in this study, we designed various time series patterns to extract the characteristic points of minute-scale trends and periodicity [17]. Similarly, in this study, various sets of time-series data were established. Then, after finishing the preprocessing, the flow data prediction was performed step for the next time point. In the prediction stage, RNN-LSTM, one of the latest deep learning algorithms, was used. Next, a threshold was set based on the difference between the predicted and observed values, and an alarm was generated in the section where the observed value exceeds the threshold to recognize the leak.

Data Preprocessing
To explain the variability of the flow rate generated in the water distribution netw system, ref. [18] composed the existing time series such as Equation (1) as Equation ( express the periodicity of the flow rate and extracted various shapes, which was method used in this study. Therefore, Q is the measured flow rate value, n is the meas time point, and m is the past time-series data. In brief, it can be explained as shown in table in Figure 2. The flow data at 0:05 from the 1st to the 3rd, which changes each predicts the flow at 0:05 the next day, considering all these shapes. Therefore, it was sible to examine various patterns and shapes of the time-series flow data generated in past as follows:

Flow Prediction (RNN-LSTM)
The Automated Data Analysis (ADA) system developed in the UK in 2010 is a m to predict leak detection based on ANN. Recently, various algorithms such as RNN LSTM derived from ANN have been built and verified. For existing RNN models, gr ent disappearance problems often lead to the problem of not recognizing flow patte On the other hand, LSTM allows us to weight historical data by adding storage cells to RNN model. Therefore, we used RNN-LSTM, which solved the gradient vanishing problem some extent through the strong storage capacity of past data. LSTM, as a cell state quentially performs Forget Gate, Input Gate, and Output Gate for and together wit moving or adding past data ( Figure 3).

Data Preprocessing
To explain the variability of the flow rate generated in the water distribution network system, ref. [18] composed the existing time series such as Equation (1) as Equation (2) to express the periodicity of the flow rate and extracted various shapes, which was the method used in this study. Therefore, Q is the measured flow rate value, n is the measured time point, and m is the past time-series data. In brief, it can be explained as shown in the table in Figure 2. The flow data at 0:05 from the 1st to the 3rd, which changes each day, predicts the flow at 0:05 the next day, considering all these shapes. Therefore, it was possible to examine various patterns and shapes of the time-series flow data generated in the past as follows:

Flow Prediction (RNN-LSTM)
The Automated Data Analysis (ADA) system developed in the UK in 2010 is a model to predict leak detection based on ANN. Recently, various algorithms such as RNN and LSTM derived from ANN have been built and verified. For existing RNN models, gradient disappearance problems often lead to the problem of not recognizing flow patterns. On the other hand, LSTM allows us to weight historical data by adding storage cells to the RNN model. Therefore, we used RNN-LSTM, which solved the gradient vanishing problems to some extent through the strong storage capacity of past data. LSTM, as a cell state, sequentially performs Forget Gate, Input Gate, and Output Gate for and together with removing or adding past data ( Figure 3). Sustainability 2021, 13, x FOR PEER REVIEW 6 of 16 Forget Gate, expressed as in Equation (3), receives ℎ and , transmits them to (Cell state) and discards improper data. Next, the data stored in the cell state is designated with the Input Gate calculated as in Equation (4). At this time, the tanh layer adds new candidate values by Equation (5) to finally determine the data to be added to the cell state. Finally, the preprocessing is completed for updating and outputting the cell state (C) at present and the last time, as in Equation (6). Finally, the preprocessing is completed for updating and outputting the cell state (C) at present and the last time, as in Equation (6). The preprocessing is completed for updating and outputting the cell state ( ) at present and the last time, as in Equation (6). Finally, the transfer of the output value from Equations (7) and (8) to the next step is determined. Unlike the existing artificial neural networks, these preprocessing and outputting have improved the ability to learn and store past data. This study also used the LSTM algorithm to indicate the predicted values as the applied flow data also contained various shapes and historical data. Equations (3)-(8) are expressed as follows: = tanh • ℎ , +

Multithreshold Classification
When discriminating outliers for single time-series data such as flow rate and electricity usage, a threshold is generally established through the control chart method with an alarm to indicate if the value is exceeded. There are various control chart methods such as Shewhart's method, cumulative summation method, exponential weighted moving average method, Hidiroglou-Berthelot (HB) method, and multivariate control chart method, with advantages and disadvantages for each method [19][20][21][22][23]. In this study, we used the Forget Gate, expressed as in Equation (3), receives h t−1 and x t , transmits them to C t−1 (Cell state) and discards improper data. Next, the data stored in the cell state is designated with the Input Gate calculated as in Equation (4). At this time, the tanh layer adds new candidate values by Equation (5) to finally determine the data to be added to the cell state. Finally, the preprocessing is completed for updating and outputting the cell state (C) at present and the last time, as in Equation (6). Finally, the preprocessing is completed for updating and outputting the cell state (C) at present and the last time, as in Equation (6). The preprocessing is completed for updating and outputting the cell state (C t−1 ) at present and the last time, as in Equation (6). Finally, the transfer of the output value from Equations (7) and (8) to the next step is determined. Unlike the existing artificial neural networks, these preprocessing and outputting have improved the ability to learn and store past data. This study also used the LSTM algorithm to indicate the predicted values as the applied flow data also contained various shapes and historical data. Equations (3)-(8) are expressed as follows:

Multithreshold Classification
When discriminating outliers for single time-series data such as flow rate and electricity usage, a threshold is generally established through the control chart method with an alarm to indicate if the value is exceeded. There are various control chart methods such as Shewhart's method, cumulative summation method, exponential weighted moving average method, Hidiroglou-Berthelot (HB) method, and multivariate control chart method, with advantages and disadvantages for each method [19][20][21][22][23]. In this study, we used the X chart method to set threshold intervals and identify outliers (leaks) among Shewhart's methods, which is a widely used method with an advantage in setting threshold intervals for flow data. Various confidence intervals were set (99%, 95%, and 90% confidence inter-vals) for the threshold intervals. Then, we constructed a model to perform leak detection for events exceeding two or more corresponding threshold values.

Model Performance Assessment
To evaluate the performance of the developed leak detection model, a confusion matrix, often used as an evaluation index of the deep learning model, was used (Table 1) to calculate the true positive ratio (TP), the true negative ratio (TN), the positive ratio (FP), and the negative ratio (FN). TPR means the ratio of data identified as leaks under the condition that the leak occurred, and FPR means the ratio of data identified as leaks from normal data [24]. In addition, FNR is an indicator for the case where a leak accident is not detected, and PPV is the precision (correct answer rate) of the model. ACC means the accuracy of the degree to which the model has learned. This means a similar degree of learning data and test data. Finally, MAPE (Mean Absolute Percentage Error) is an index expressing accuracy as a percentage, and the lower the index, the better the performance [25]. Therefore, a proper leak detection model should have high TPR, TNR, ACC, and low FPR [15]. Consequently, the performance portion of the model can be evaluated by PPV and ACC. In addition, evaluation of detailed leak recognition can be evaluated by TPR, FPR, etc. In this work, we perform a detailed evaluation of model performance by leveraging these Confusion matrices.

Leakage Accidents
The proposed model can use measured or simulated (hydraulic data) "flow rate(Q)" as an input datum. If the measured flow rate data cannot obtain because of the absence of flow measurements in the water system, we can derive the simulated flow rate, which is the satisfied mass and energy conservation law in pressurized pipe networks. In this study, however, we used real measured flow data (from a total of six flow measurement points) to investigate the practical applicability of our proposed model. We collected the actual flow time series data of the leak accident site in Korea for the performance evaluation of the proposed leak detection model. According to the data obtained, the 800 mm-diameter wide-area water supply pipe was damaged at 15:17 on 24 January 2020, and it took about an hour to recognize the leak. The wide-area water supply was formed as shown in Figure 4. The accident section was between the F1 and F5 instruments, and the accident occurred at the front end of the valve before branching to the F5 and F6 instruments. While the main pipeline (F1) supplied an average of 1.249 CMD (Cubic Meter per Day) of water under normal operation conditions, the flow rate dropped to 1.141 CMD after the leakage accident, about 100 tons of water lost per day. The flow meter sensor was installed on the F1 to F6 as shown in Figure 4. Data was sent every minute. The data was obtained from the Korea Water Resources Corporation.
the satisfied mass and energy conservation law in pressurized pipe networks. In this study, however, we used real measured flow data (from a total of six flow measurement points) to investigate the practical applicability of our proposed model. We collected the actual flow time series data of the leak accident site in Korea for the performance evaluation of the proposed leak detection model. According to the data obtained, the 800 mmdiameter wide-area water supply pipe was damaged at 15:17 on 24 January 2020, and it took about an hour to recognize the leak. The wide-area water supply was formed as shown in Figure 4. The accident section was between the F1 and F5 instruments, and the accident occurred at the front end of the valve before branching to the F5 and F6 instruments. While the main pipeline (F1) supplied an average of 1.249 (Cubic Meter per Day) of water under normal operation conditions, the flow rate dropped to 1.141 after the leakage accident, about 100 tons of water lost per day. The flow meter sensor was installed on the F1 to F6 as shown in Figure 4. Data was sent every minute. The data was obtained from the Korea Water Resources Corporation.

Parameter Setting and Validation of LSTM Model
In order to improve the performance of the LSTM model utilized for flow prediction in this study, it is necessary to properly determine the number of neurons per layer along with the number of hidden layers. In this work, the analysis was conducted based on cases such as Table 2 to determine the appropriate number of neurons and the number of layers. As previously mentioned, the target area consists of six measuring zones. Among them, 9 days of data from F1 (13 to 22 January) were used as training data and data from 23 January was used as test data. The time series data are shown in Figure 5. The scenario consists of one layer of input layer first, followed by three layers at the end of the two layers, including hidden layers, as shown in Table 2. As a result, an increase in the number of layers resulted in a gradual decrease in the difference between the actual and predicted values. Among them, the smallest error rate was derived from the three layers combined with

Parameter Setting and Validation of LSTM Model
In order to improve the performance of the LSTM model utilized for flow prediction in this study, it is necessary to properly determine the number of neurons per layer along with the number of hidden layers. In this work, the analysis was conducted based on cases such as Table 2 to determine the appropriate number of neurons and the number of layers. As previously mentioned, the target area consists of six measuring zones. Among them, 9 days of data from F1 (13 to 22 January) were used as training data and data from 23 January was used as test data. The time series data are shown in Figure 5. The scenario consists of one layer of input layer first, followed by three layers at the end of the two layers, including hidden layers, as shown in Table 2. As a result, an increase in the number of layers resulted in a gradual decrease in the difference between the actual and predicted values. Among them, the smallest error rate was derived from the three layers combined with 128-64-48 neurons. Therefore, in this study, we construct a model consisting of three layers and 128, 64, and 48 layers for each layer.
In addition, the sensitivity analysis of the hyper parameters such as the batch size and epoch were performed to find the optimal coefficient for the model. The applied coefficients are as shown in Table 3. The coefficients applied by ref. [15] are batch size 60 and epoch 120; the epoch is the number of operations, and batch size is the size of the data applied to the operation. In this study, the best value was sought by applying ±10 for the coefficients applied by ref. [15]. Of course, in the future, more diverse parameter optimization models [26][27][28] need to be reviewed to extend the accuracy of the model. First, for epoch, 110-130 operations were applied to F1 data. As a result, the maximum number of operations given by loss is 120 times or more. Consequently, it was determined that more than 121 operations could worsen the performance of the model. As a result, the epoch was set to 120. The batch size is associated with the accuracy of the model, so it is analyzed based on MAPE, which means the difference between the actual observations Sustainability 2021, 13, 9262 9 of 15 and the predictions in the F1 data. If 60 is applied, it is found to have the best performance at 2.20. Therefore, in this study, it was decided to utilize 60. 128-64-48 neurons. Therefore, in this study, we construct a model consisting of three layers and 128, 64, and 48 layers for each layer.  In addition, the sensitivity analysis of the hyper parameters such as the batch size and epoch were performed to find the optimal coefficient for the model. The applied coefficients are as shown in Table 3. The coefficients applied by ref. [15] are batch size 60 and epoch 120; the epoch is the number of operations, and batch size is the size of the data applied to the operation. In this study, the best value was sought by applying ±10 for the coefficients applied by ref. [15]. Of course, in the future, more diverse parameter optimization models [26][27][28] need to be reviewed to extend the accuracy of the model. First, for epoch, 110-130 operations were applied to F1 data. As a result, the maximum number of operations given by loss is 120 times or more. Consequently, it was determined that more than 121 operations could worsen the performance of the model. As a result, the epoch was set to 120. The batch size is associated with the accuracy of the model, so it is analyzed based on MAPE, which means the difference between the actual observations and the predictions in the F1 data. If 60 is applied, it is found to have the best performance at 2.20. Therefore, in this study, it was decided to utilize 60.

Result of Applying the Leak Detection Model
In this study, the number of LSTM model nodes was 128, 64, and 48, and the commonly applied tanh was used for the activation function, which plays the role of being output by the sigmoid function. In addition, an optimal learning rate of 0.002, batch size of 60, and epoch 120 were applied, and weighted values were updated 7.200 times. The 59-day data was secured, of which 49 days of data were used for learning. The leak detection model developed based on the data of six flow measurement points from F1-F6 was applied. Table 4 shows the results of the performance evaluation index of the model learned by the data stored at each measurement point. In the case of an actual leak accident, F6 showed the highest level (99.81) in terms of the percentage of data that accurately identified it as a leak (99.81). However, it was difficult to recognize the leak accident in F2 at 46.46%, presumably because the model could not follow the shape of the actual values before the point of leakage, as shown in Figure 6b. That is, we believe the shape extraction for the frequent data fluctuations was not performed.

Possibility of Zoning Leak Points by Utilizing Additional Measurements
In general, the leak detection model in water distribution network systems should have two functions: the accurate detection of leak accidents and the identification of corresponding points. The frequency of false alarms should be minimized in a leak detection model to locate the leak accident points accurately. In this study, as we proposed methods to recognize leakage accidents promptly through the previous application results, it is essential to identify the leak points as the next step. Therefore, we now propose a zoning plan to locate the leak accident points. Typically, the district meters area is zoned by the pressure [29][30][31]. In this study, the leakage area was narrowed only by measured data, not by pressure.
"O" indicates when the leak was detected, and "X" indicates when the leak was not detected in Table 5, which shows the application result for the actual leak accidents to which the leak detection model was previously applied. The actual leak accident time was 15:17. For F1, the leak was immediately detected by the model as the amount of water supply increased rapidly by the leak accident. However, except for the directly connected F3 point, the leak accidents were not recognized promptly, and the model spent 3-4 min detecting the leak based on the data of F2, F4, F5, and F6. For FPR, the proportion of leak data identified as leaks under normal circumstances represented false measurements. The value with the lowest corresponding value was determined as F5. We believe that a high level of the threshold value was calculated in prediction and, accordingly, leak detection was hardly performed. F2 showed the highest leak rate, and reasonably so since the observed value and the predicted value were significantly different. On the other hand, in all other measurement points, the false measurement ratios were significantly low.
For FNR, the F2 point showed 53.54%, significantly higher than other points, which is the opposite case of TPR, an actual leak accident not being detected. Similarly, all other points were derived as 1-TPR values. Therefore, TNR also had the opposite meaning to FPR, and 1-FPR values were derived, and F2 was the highest value.
ACC is a measure to evaluate the detection ability of the model as the percentage of correctly recognized leak accidents. F2 showed the lowest value, while all other points showed a high detection ability of over 90%. In the case of F2, we believe the prediction ability was significantly reduced due to the large differences in the observed values caused by problems of shape extraction in the pattern of daily water usage. We believe that the amount of actual learning data was limited because of using the real data. However, the result can be sufficiently supplemented if the scope of shape extraction and the amount of learning data are expanded in the future. PPV is also used as an indicator almost similar to ACC and expressed as precision. This model was found to be able to recognize more than 90% of leaks except for F2 in the actual data. In addition, all values of 15 or less were obtained except for F2, such as FDR and FPR, which are indicators for false measurements.
F2 showed low evaluation scores in all indicators related to the probability of false measurements. In contrast, F6 showed the most reliable ability to predict the observed values with high positive and low false measurement rates. As the flow rate data oscillates within a relatively constant range, the appropriate variance and standard deviation were generated to extract the shape for each time point relatively well and set the appropriate threshold values.
F1 displayed false measurements and leak detection only at the time of leakage, as shown in Figure 6a, and was evaluated as the second-highest performance point. We believed this result is due to the function of F1, as a point in charge of the main pipeline, to continuously maintain the flow rate pattern before a leak occurs.
MAPE, a measure of prediction, showed proper prediction levels except for the F3 measurement point. In the case of F3, as shown in Figure 6c, the LSTM model did not recognize the corresponding shape due to a leak accident that occurred with little change. It is necessary to collect various shapes from more points to solve this problem.
As a result, the leak detection ability was the lowest at F2 in terms of precision, mainly because the flow rate variability was unique, as shown in Figure 6b. In this study, by utilizing 49-day data, seven of the above-described shape extractions were extracted for each period. However, we believe the related point produces significantly more flow data, and more learning data is needed.

Possibility of Zoning Leak Points by Utilizing Additional Measurements
In general, the leak detection model in water distribution network systems should have two functions: the accurate detection of leak accidents and the identification of corresponding points. The frequency of false alarms should be minimized in a leak detection model to locate the leak accident points accurately. In this study, as we proposed methods to recognize leakage accidents promptly through the previous application results, it is essential to identify the leak points as the next step. Therefore, we now propose a zoning plan to locate the leak accident points. Typically, the district meters area is zoned by the pressure [29][30][31]. In this study, the leakage area was narrowed only by measured data, not by pressure.
"O" indicates when the leak was detected, and "X" indicates when the leak was not detected in Table 5, which shows the application result for the actual leak accidents to which the leak detection model was previously applied. The actual leak accident time was 15:17. For F1, the leak was immediately detected by the model as the amount of water supply increased rapidly by the leak accident. However, except for the directly connected F3 point, the leak accidents were not recognized promptly, and the model spent 3-4 min detecting the leak based on the data of F2, F4, F5, and F6.
In detail, the flow rate dropped rapidly at F2 and F4 at the time of the accident (Figure 7). This model did not identify the drop in the flow rate as a leak because an alarm was set to go off based on a sudden increase in flow rate. In addition, water was not supplied at all in F5 and F6, and the flow rate was temporarily reduced. Therefore, the model identified the leak after a while, not immediately. No leaks occurred between F1 and F3, although they were identified as leaks at the time of the leak. Therefore, we can exclude F2 from the possible occurrence points and reduce the possible zones to F3-F4, F4-F5, F6, F5, and F6. In addition, because the water supplied to F5 and F6 was almost entirely disconnected around 15:31 based on the specific data at the time of the leak accident, the range of the leak accident can be further reduced after 14 min. Therefore, it is possible to zone the leak points through a comprehensive analysis of the measurement data by utilizing the characteristics of the water distribution network system with many points of flow data measurements. In the future, we believe it is possible to determine the optimal location where the instrument should be installed. First, we believe building an algorithm capable of comprehensively analyzing data on multiple instrument locations and various scenarios on different instruments and leaks is necessary beforehand.

Conclusions
A water distribution network system, an essential infrastructure to supply water, requires constant maintenance and emergency plans to ensure the continuity and stability of the water supply. It is necessary to instantly recognize the accident, identify the location, and implement the planned countermeasures to minimize damages such as water

Conclusions
A water distribution network system, an essential infrastructure to supply water, requires constant maintenance and emergency plans to ensure the continuity and stability of the water supply. It is necessary to instantly recognize the accident, identify the location, and implement the planned countermeasures to minimize damages such as water outages under sudden abnormal situations. Accordingly, quickly recognizing leak accidents in a water distribution network is economically significant and prevents secondary damage by saving the time to respond and increasing the time to prepare countermeasures. Hence, it is essential for a stable supply of water and minimizing damage to consumers.
In this study, we proposed a data-based leak detection model for a leak accident in a water distribution network system and evaluated its performance by applying it to actual cases. The application result showed good performance as the leak detection model recognized most leak accidents quicker than few measuring instruments. The performance of leak detection was evaluated by Confusion matrix and showed more than 90% accuracy at all points except singularities. The developed model is expected as a critical software technology to proactively identify various issues at present with smart water infrastructure being introduced. In addition, we proposed a zoning method for the model developed concerning the next task of leak detection, which can be used as fundamental data in relation to the zoning of leak accidents according to the location of the measuring instrument. However, it is necessary to apply the model developed in this study to the data of various time points and leak amounts for continued performance evaluation. In addition, the optimization of some parameters affecting the efficiency of deep learning techniques is a very important part and it is a future study area that needs to be continuously improved. In other words, although the tuning of hyper parameters through general sensitivity analysis was performed in this study, proceeding with effective parameter setting in combination with some metaheuristic techniques should also be tackled. Comparative analyses with other deep learning models should also be performed for further verification of the proposed leak recognition model in this work. Funding: This research has been performed as Project No B-T012 and supported by K-water.