Estimation of Non-Revenue Water Ratio for Sustainable Management Using Artificial Neural Network and Z-Score in Incheon , Republic of Korea

The non-revenue water (NRW) ratio in a water distribution system is the ratio of the loss due to unbilled authorized consumption, apparent losses and real losses to the overall system input volume (SIV). The method of estimating the NRW ratio by measurement might not work in an area with no district metered areas (DMAs) or with unclear administrative district. Through multiple regression analyses is a statistical analysis method for calculating the NRW ratio using the main parameters of the water distribution system, although its disadvantage is lower accuracy than that of the measured NRW ratio. In this study, an artificial neural network (ANN) was used to estimate the NRW ratio. The results of the study proved that the accuracy of NRW ratio calculated by the ANN model was higher than by multiple regression analysis. The developed ANN model was shown to have an accuracy that varies depending on the number of neurons in the hidden layer. Therefore, when using the ANN model, the optimal number of neurons must be determined. In addition, the accuracy of the outlier removal condition was higher than that of the original data used condition.


Introduction
Non-revenue water (NRW) includes water lost from physical incidents such as pipe leaks caused by bursts in a water distribution system and water-related commercial losses stemming from illegal connections, unmetered public use and meter error [1].NRW ratio is 5-50% for major countries.Singapore, Denmark and Netherlands have the lowest NRW ratio (5-6%), while Chile (34%) and Mexico (51%) have the highest NRW ratio [2].According to data from Korea waterworks 2015 [3], the NRW ratio of major cities in Korea is the lowest in Seoul at 4.9% and the highest in Gwangju at 56.8%.Incheon has an NRW ratio of 11.2%, lower than the national average of 16.3%percent.
Incheon takes its tap water from Paldang Dam via a single pipeline, thus making it vulnerable to pipe breakage due to accident or disaster [4,5].This makes consumers likely to suffer damage due to suspension of water supply.To prevent this, the management of hydraulic pressure in the pipe network and regular evaluation of pipe deterioration are recommended measures.A decrease in the NRW ratio correlates to the reduction of leak quantity by optimal operation management in a district metered areas (DMA).
Analysis of the effects of pipe damage on the overall water distribution system helps determine what to improve first in the water pipeline [5].A systematic plan for replacement and remediation is in effect for the maintenance of the city waterworks [6][7][8].Though improvement projects for old waterworks are being implemented, it is difficult to reduce the system's economic losses and improve its function via the evaluation of old pipes and accident prevention, which depend on empirical judgment [9,10].
Therefore, research and analysis of the factors affecting leaks when deciding the priority of water distribution system maintenance are needed, as well as identifying the physical and operational factors affecting leaks with parameters such as hydraulic pressure, deteriorated pipe ratio and water supply quantity.To decrease the NRW ratio, studies such as those on pipe network analysis, reliability enhancement, diagnosis of pipe network technology and evaluation of pipe deterioration for optimal water distribution were conducted in previous research.
Determining the level of leaks and bursts in the overall volume of NRW, a performance indicator was found for comparing leak management in water supply system: The Infrastructure Leakage Index (ILI) [11][12][13].
In addition, studies have been carried out on the parameters of a water distribution system.A regression equation for predicting the NRW ratio was developed using statistical analysis by acquiring main parameter and statistical data on the analysis of water distribution system [14].And water supply and the operating and maintenance cost of a water distribution system was suggested [15].The system for performance indicators revised for small water supply utilities.Principal component analysis (PCA) was used to reduce the dimensionality of the original data [16,17].
These statistical techniques and performance indicators were helpful in forecasting NRW, and a number of parameters of water distribution systems were proposed and analyzed.This suggested numerous approaches to improve the accuracy of NRW ratio prediction, as well as a scientific approach toward the sustainable management of water distribution systems.
A well-established DMA in water distribution systems can be analyzed through physical and operational parameters [18].To estimate the NRW ratio, including the amount of water leaks, the main parameters of water distribution systems appropriate for regional characteristics are selected, and the NRW calculation model, which was developed by statistical analysis, plays an important role in the planning and operating of DMA.
An artificial neural network (ANN) is a model used for predicting dependent variables through statistical learning algorithms when sufficient data on independent variables are available to describe dependent variables.Due to the lack of sufficient learning data, however, the ANN model has not been widely used in the estimation of the NRW ratio.
Major ANN studies applied to water distribution systems in recent years are as follows.A procedure to devise a general operating policy toward reservoir operation from a dynamic programming using neural network (DPN) was suggested [19].Relatively new technique of using ANNs researched for forecasting short-term water demand [20].ANNs in water quality modeling, as well as for the process and control of treating drinking water used in water distribution systems [21].Research on the application of ANNs for analysis of data from sensors measuring hydraulic parameters are presented [22].Additionally, the efficiency of computational intelligence techniques was compared in water demand forecasting [23].
Recent research about ANN used it as a means of estimating the temporal variation of analytic factors such as real-time water quality, operation of reservoir and short-term demand forecasting.The application of an ANN to water distribution systems for estimating NRW and parameter analysis, however, proved insufficient.
In this study, a model for NRW ratio calculation for Incheon was developed by considering an ANN and parameters of major water distribution systems.The statistical method was used to compare the results of the ANN and real measured values according to the removal of outliers through the use of Z-score standardization.
The results of the NRW ratio by multiple regression analysis and an ANN were compared through accuracy assessment analysis.To estimate the NRW ratio, parameters including deteriorated pipe ratio, water supply quantity per demand junction and demand energy ratio were selected in the previous research [24].Demand energy was calculated using simulated nodal hydraulic pressure and demand using EPANET 2.0 (Environmental Protection Agency, Cincinnati, OH, USA, 2000), a hydraulic numerical analysis model for water distribution systems.

Analysis of Water Supply Energy in Water Distribution Systems
The EPANET 2.0 model developed by the U.S. Environmental Protection Agency was used for the hydraulic modeling of DMA.This model used the gradient algorithm for pipe network analysis, and the extended period simulation was applied to analyze the hydraulic flow in the pipe network under the time-series condition [25].
Under the EPANET 2.0 model, a hybrid node-loop approach is used to calculate continuity and energy equations for in-pipe flow analysis.Continuous equations, the main theory of water network analysis, and energy equations for the analysis of energy losses were used in the model.The energy required in the pipe network can be divided into water supply and leakage energy that represented the water velocity and pressure head in the inner pipe.If velocity and pressure head are high in a pipe network, this raises the leakage quantity, so the required level of water supply energy in each demand junction that can be supplied will maintain minimum hydraulic pressure.In addition, the energy arising from difference between the total hydraulic head and minimum hydraulic head for stable water supply at junction is regarded as excessive energy that affects leakage.
The water supply energy is calculated by Equation (1) as the energy arising from the minimum residual head required for water supply.The estimation method for water supply energy is calculated by multiplying water demand and the hydraulic head via analysis of the EPANET pipe network in each junction.
The minimum residual head varies depending on the building level of the direct water supply according to related regulations of the water service provider.The Incheon Water Supply Ordinance allows direct water supply up to the fifth floor from ground level on condition of no pumping system, and the minimum level of residual hydraulic head is set at 25 m.This hydraulic head regarded as standard of available water supply energy in Incheon.The excessive energy is the difference between the total supply and available supply energy, which affects leaks in the pipe network and available supply energy is a condition of hydraulic pressure at 25 m.The demand energy ratio is the percentage of total supply energy divided by available supply energy considering the energy loss in the pipe network.Excessive energy can be defined as energy excluding the available supply of energy from the total supply of energy.When excessive energy is high, the demand energy ratio increases proportionally, which causes a higher volume of leakage.

Statistical Analysis
Statistical analysis was performed to find correlations of main parameters in water distribution systems.The method was to clarify and verify the functional relationship between parameters and analyze the correlation between the parameters of water distribution systems and the relationship between the selected dependent variable and independent variables.

Correlation Analysis
Correlation analysis studies the linear relationship between two variables in probability theory and statistics.Both variables can be correlated with each other from an independent relationship, and the strength of their relationship is called a Pearson correlation coefficient as defined as an Equation (2) [26].The correlation analysis was used to compare the accuracy between the ANN simulation and the actual measured values.
where r xy is the correlation coefficient and x, y the mean values of x and y.
The correlation coefficient is obtained between minus 1.0 and 1.0 and has the following characteristics.Multiple regression analysis is an analytical technique that estimates causality between variables by statistical methods, as well as a method to analyze the regression model with a dependent variable and two or more independent variables.The multiple linear regression model with independent variables is expressed as (3).
where x is the independent variable, y is the dependent variable, β is the regression coefficient and β 0 is the regression intercept.
A method for estimating the coefficients of multiple regression equations is a simultaneous input method for analyzing all independent variables and a method for removing specified variables at once, making a model consisting of constant terms only.In addition, the backward method eliminates all variables one by one according to the removal criterion after selection, and the stepwise method determines selection and exclusion of variables in each step [27].

Artificial Neural Network
An ANN is a massively parallel distributed processor with a natural propensity for storing experiential knowledge and making it available for use.It resembles the human brain in two respects: knowledge is acquired by the network through a learning process and inter-neuron connection strengths, known as synaptic weights, are used to store the knowledge [28].
The ANN procedure used is a feed-forward network type with input, hidden and output layers, as shown in Figure 1.Neurons in the input layer simply act as a buffer.Neurons in various layers are interconnected through weights.Neurons in the hidden and output layers are called the activation function, and the activation function used here is a sigmoidal activation function.The input for each neuron j in the hidden layer is the sum of the weighted input signal xi.(∑ w ji x i = net j , in which w ji is the interconnecting weight between neuron j in the hidden layer and neuron i in the input layer.)The output y j from the neuron given by the neuron output in the output layer is computed similarly.
Sustainability 2017, 9, 1193 4 of 15 where is the correlation coefficient and ̅ , the mean values of x and y.The correlation coefficient is obtained between minus 1.0 and 1.0 and has the following characteristics.Multiple regression analysis is an analytical technique that estimates causality between variables by statistical methods, as well as a method to analyze the regression model with a dependent variable and two or more independent variables.The multiple linear regression model with independent variables is expressed as (3).
where x is the independent variable, y is the dependent variable, β is the regression coefficient and β0 is the regression intercept.
A method for estimating the coefficients of multiple regression equations is a simultaneous input method for analyzing all independent variables and a method for removing specified variables at once, making a model consisting of constant terms only.In addition, the backward method eliminates all variables one by one according to the removal criterion after selection, and the stepwise method determines selection and exclusion of variables in each step [27].

Artificial Neural Network
An ANN is a massively parallel distributed processor with a natural propensity for storing experiential knowledge and making it available for use.It resembles the human brain in two respects: knowledge is acquired by the network through a learning process and inter-neuron connection strengths, known as synaptic weights, are used to store the knowledge [28].
The ANN procedure used is a feed-forward network type with input, hidden and output layers, as shown in Figure 1.Neurons in the input layer simply act as a buffer.Neurons in various layers are interconnected through weights.Neurons in the hidden and output layers are called the activation function, and the activation function used here is a sigmoidal activation function.The input for each neuron j in the hidden layer is the sum of the weighted input signal xi.(∑ = , in which is the interconnecting weight between neuron in the hidden layer and neuron i in the input layer.)The output from the neuron given by the neuron output in the output layer is computed similarly.

Status and Data Collection of Waterworks in the Target Area
The target area for this study was the Korean city of Incheon.The data collected included the status of the area, waterworks facilities and operational status, and the water supply indicators of the

Status and Data Collection of Waterworks in the Target Area
The target area for this study was the Korean city of Incheon.The data collected included the status of the area, waterworks facilities and operational status, and the water supply indicators of the Incheon Waterworks Basic Plan of 2015.In addition, various hydraulic design data of the water distribution system and hydraulic simulation results were collected.

Status of Waterworks in Target Area
The water population of Incheon is 2,851,491 and the water supply rate is 98.3%.The daily water supply per person is 343 L, and the water supply area is divided into nine districts.The city has 24 reservoirs and 68 pumping stations.The total length of the network is 3634 km.DMAs were built in Incheon that divide all water supply districts into separate ones instead of directly supplying water from the water purification plant to tap.The DMA system of Incheon consists of six large DMAs within the boundary of the water purification plant, 32 DMAs in the reservoir boundary and 367 detailed small DMAs from reservoir boundary [29].Table 1 shows the classification of Incheon's DMA system.The observed NRW ratio in 135 DMAs is shown in Figure 2.
Sustainability 2017, 9, 1193 5 of 15 Incheon Waterworks Basic Plan of 2015.In addition, various hydraulic design data of the water distribution system and hydraulic simulation results were collected.

Status of Waterworks in Target Area
The water population of Incheon is 2,851,491 and the water supply rate is 98.3%.The daily water supply per person is 343 L, and the water supply area is divided into nine districts.The city has 24 reservoirs and 68 pumping stations.The total length of the network is 3634 km.DMAs were built in Incheon that divide all water supply districts into separate ones instead of directly supplying water from the water purification plant to tap.The DMA system of Incheon consists of six large DMAs within the boundary of the water purification plant, 32 DMAs in the reservoir boundary and 367 detailed small DMAs from reservoir boundary [29].Table 1 shows the classification of Incheon's DMA system.The observed NRW ratio in 135 DMAs is shown in Figure 2.

Hydraulic Analysis of Water Distribution Systems
Analysis of Incheon's pipe network was done using the diagnosis data of water pipe technology established in 2015.Incheon Metropolitan City Waterworks, based on the GIS, built the pipe network by acquiring data such as pipe diameter and length, valve, flowmeter and ground level.
The hydraulic simulation of the network was performed for each DMA and the demand energy ratio (total supply energy/available supply energy) for each junction of a small DMA was calculated from the results of the analysis.Data such as pipe length, average pipe diameter, number of demand junctions and water supply amount for each DMA were used to construct the EPANET model.
The condition of EPANET simulation is that of the designated maximum water supply in 2015, and the demand energy ratio is obtained by calculating the pressure of the nodal point based on the demand amount at each node of a DMA.Based on the modeling simulation, the demand energy ratio of each DMA is shown in Figure 3.

Hydraulic Analysis of Water Distribution Systems
Analysis of Incheon's pipe network was done using the diagnosis data of water pipe technology established in 2015.Incheon Metropolitan City Waterworks, based on the GIS, built the pipe network by acquiring data such as pipe diameter and length, valve, flowmeter and ground level.
The hydraulic simulation of the network was performed for each DMA and the demand energy ratio (total supply energy/available supply energy) for each junction of a small DMA was calculated from the results of the analysis.Data such as pipe length, average pipe diameter, number of demand junctions and water supply amount for each DMA were used to construct the EPANET model.
The condition of EPANET simulation is that of the designated maximum water supply in 2015, and the demand energy ratio is obtained by calculating the pressure of the nodal point based on the demand amount at each node of a DMA.Based on the modeling simulation, the demand energy ratio of each DMA is shown in Figure 3.

Selection and Characteristics of Main Parameters
Analysis of the technical diagnosis results of Incheon's water pipe network established in 2015 showed that water pipe deterioration in the DMA system greatly influences NRW [29].The deteriorated pipe ratio, pipe length, mean pipe diameter, number of demand junctions, water supply quantity, number of leaks and demand energy ratio of DMAs were selected as parameters that could affect the NRW ratio.
To derive the parameters with high correlation with the NRW ratio, three parameters were selected: the deteriorated pipe ratio, demand energy ratio and water supply quantity per junction through multiple regression analysis.From the previous research, the main parameters selected according to the statistically significant order of multiple regression analysis [24]; this is described in detail in Section 4.3.
The demand energy ratio is calculated by dividing the actual supply energy by the minimum required energy in the water supply network.The deteriorated pipe ratio is a parameter determined by pipe installation by year and pipe material.The number of leaks tends to increase as the degree of aging rises, and the water supply quantity per demand junction increases in apartments and densely populated districts.

Correlation Analysis of Each Parameter
To analyze the correlations between the parameters of water distribution systems, the physical and operational data of selected parameters in each DMA were used based on a diagnosis of Incheon's water network technology done in 2015.Data on 135 DMAs in Incheon were collected.
Table 2 shows the correlation analysis results for each parameter.The deteriorated pipe ratio and the number of leaks had a high correlation with the NRW ratio [24].A positive correlation tendency was seen with the NRW ratio in the number of demand junctions and demand energy ratio, but the Pearson correlation coefficient of under 0.5 shows a low relationship with the measured NRW ratio.And the same coefficient between the water supply quantity and pipe length was 0.71, showing the highest correlation among the 10 used parameters.
As a result of the correlation analysis, the Pearson correlation coefficient was less than 0.5, except for the deteriorated pipe ratio, and the correlation between the NRW ratio and used parameters were found to be not high.The negative correlation coefficient was represented by figures such as the mean pipe diameter, mean pipe length per demand junction, water supply quantity per demand junction and water supply quantity.
Table 3 is results of basic statistical analysis of used parameters of Incheon, 135 DMAs were selected and data collection was done.

Selection and Characteristics of Main Parameters
Analysis of the technical diagnosis results of Incheon's water pipe network established in 2015 showed that water pipe deterioration in the DMA system greatly influences NRW [29].The deteriorated pipe ratio, pipe length, mean pipe diameter, number of demand junctions, water supply quantity, number of leaks and demand energy ratio of DMAs were selected as parameters that could affect the NRW ratio.
To derive the parameters with high correlation with the NRW ratio, three parameters were selected: the deteriorated pipe ratio, demand energy ratio and water supply quantity per junction through multiple regression analysis.From the previous research, the main parameters selected according to the statistically significant order of multiple regression analysis [24]; this is described in detail in Section 4.3.
The demand energy ratio is calculated by dividing the actual supply energy by the minimum required energy in the water supply network.The deteriorated pipe ratio is a parameter determined by pipe installation by year and pipe material.The number of leaks tends to increase as the degree of aging rises, and the water supply quantity per demand junction increases in apartments and densely populated districts.

Correlation Analysis of Each Parameter
To analyze the correlations between the parameters of water distribution systems, the physical and operational data of selected parameters in each DMA were used based on a diagnosis of Incheon's water network technology done in 2015.Data on 135 DMAs in Incheon were collected.
Table 2 shows the correlation analysis results for each parameter.The deteriorated pipe ratio and the number of leaks had a high correlation with the NRW ratio [24].A positive correlation tendency was seen with the NRW ratio in the number of demand junctions and demand energy ratio, but the Pearson correlation coefficient of under 0.5 shows a low relationship with the measured NRW ratio.And the same coefficient between the water supply quantity and pipe length was 0.71, showing the highest correlation among the 10 used parameters.
As a result of the correlation analysis, the Pearson correlation coefficient was less than 0.5, except for the deteriorated pipe ratio, and the correlation between the NRW ratio and used parameters were found to be not high.The negative correlation coefficient was represented by figures such as the mean pipe diameter, mean pipe length per demand junction, water supply quantity per demand junction and water supply quantity.
Table 3 is results of basic statistical analysis of used parameters of Incheon, 135 DMAs were selected and data collection was done.

Selection of Main Parameters for Estimation of NRW Ratio
To analyze the correlation between the NRW ratio and the main parameters of water distribution systems, 135 DMAs were used excluding those unfinished, non-operating or abnormally operating among 367 DMAs of Incheon underwent multiple regression analysis.For this analysis, the number of demand junctions, pipe length, mean pipe diameter, water supply quantity per demand junction, number of leaks, deteriorated pipe ratio, demand energy ratio, pipe length per demand junction and water supply quantity were selected as independent variables in the multiple regression model, and the NRW ratio was selected as the dependent variable.
As a result of the multiple regression analysis using the stepwise selection method, the deteriorated pipe ratio (%), water supply quantity per demand junction (m 3 /day/junction) and demand energy ratio (%) were selected under the condition that satisfied statistical significance (T-statistics and probability value are statistically satisfied).A multiple regression equation with three independent variables was thus derived for estimation of the NRW ratio.Table 4 shows the statistical results of all parameters used to estimate the NRW ratio using multiple regression analysis.In statistical hypothesis testing, the probability value (p-value) is the probability for a given statistical model that, when the null hypothesis is true, the statistical summary (such as the sample mean difference between two compared groups) is the same as or of higher than the measured results.If the p-value is higher than 0.05 and the T-statistic is lower than 1.196, this means it is not statistically significant [30].
Table 5 shows the results of multiple regression analysis with the NRW ratio as a dependent variable.This is considered reliable because the T-statistic of independent supply variables is more than ±1.96 and the p-value is less than 0.05 [24].From the multiple regression analysis of Table 5, the regression equation of the NRW ratio can be defined as Equation (5).As the parameter affecting the NRW ratio, the deteriorated pipe ratio was 0.663, the demand energy ratio was 4.310, and the amount of water supply per demand junction 0.069.The value of each parameter is calculated according to Equation (5).In addition to these three parameters, the NRW ratio is fixed at 4.684 percent as the constant, and the ratios of deteriorated pipe and demand energy are increasing parameters.The water supply quantity per demand junction is a decreasing parameter in the estimation of the NRW ratio.y = 4.684 + 0.663x 1 + 4.310x 2 − 0.069x 3 (5) where, y is the NRW ratio (%), x 1 is the deteriorated pipe ratio (%), x 2 is the demand energy ratio (%), and x 3 is the amount of water supply per demand junction (m 3 /day/junction) As the demand energy ratio of DMAs in Incheon is calculated between 1 and 2 except for those on high elevation ground, it shows that the NRW ratio can be raised within 10% according to the energy ratio.In an area with high water supply such as apartment and dense population areas, the NRW ratio will decrease.

Model Construction of ANN
To estimate the NRW ratio using an artificial neural network (ANN), the results of multiple regression analysis were used to determine independent variables with the three parameters of the ratios of deteriorated pipe and demand energy and the water supply quantity per demand junction.The objective function was used to calculate the NRW ratio (%) via ANN. Figure 4 represents the constructed ANN model used in this study.
where, y is the NRW ratio (%), is the deteriorated pipe ratio (%), is the demand energy ratio (%), and is the amount of water supply per demand junction (m 3 /day/junction) As the demand energy ratio of DMAs in Incheon is calculated between 1 and 2 except for those on high elevation ground, it shows that the NRW ratio can be raised within 10% according to the energy ratio.In an area with high water supply such as apartment and dense population areas, the NRW ratio will decrease.

Model Construction of ANN
To estimate the NRW ratio using an artificial neural network (ANN), the results of multiple regression analysis were used to determine independent variables with the three parameters of the ratios of deteriorated pipe and demand energy and the water supply quantity per demand junction.The objective function was used to calculate the NRW ratio (%) via ANN. Figure 4 represents the constructed ANN model used in this study.If many parameters are used, the problem of over-fitting could occur in ANN simulation, so the modeling case is made with a minimum number of parameters.An ANN simulation was performed by using 10, 20 and 30 neurons in the hidden layer.

Estimation of NRW Ratio via ANN
The ANN model was built using a single layer of an ANN structure and a back propagation algorithm.In the learning method of back propagation, an input signal to an input layer is transferred to hidden and output layers through the transfer function between layers.By comparing the transmitted signal with the desired one, the error between the target and learning values is determined in the final output layer.The error is again transmitted in the reverse direction and then the weight of each layer is updated.
This study implemented an ANN using the MATLAB program.A neural network toolbox was used in MATLAB and the Levenberg-Marquardt method of back propagation was used for training.This network training function updated weight and bias values according to the Levenberg-Marquardt optimization.
Figure 5 is the NRW ratio derived from ANN.The grey solid line shows the result of NRW by measurement, and the estimated NRW ratio of each DMA is shown when the number of neurons in the hidden layer is set to 10, 20 and 30, respectively.The measured NRW ratio was 0.5-58.9percent, while the NRW ratio by ANN was estimated to be within 0.5-49.1 percent.The mean error rate was If many parameters are used, the problem of over-fitting could occur in ANN simulation, so the modeling case is made with a minimum number of parameters.An ANN simulation was performed by using 10, 20 and 30 neurons in the hidden layer.

Estimation of NRW Ratio via ANN
The ANN model was built using a single layer of an ANN structure and a back propagation algorithm.In the learning method of back propagation, an input signal to an input layer is transferred to hidden and output layers through the transfer function between layers.By comparing the transmitted signal with the desired one, the error between the target and learning values is determined in the final output layer.The error is again transmitted in the reverse direction and then the weight of each layer is updated.
This study implemented an ANN using the MATLAB program.A neural network toolbox was used in MATLAB and the Levenberg-Marquardt method of back propagation was used for training.This network training function updated weight and bias values according to the Levenberg-Marquardt optimization.
Figure 5 is the NRW ratio derived from ANN.The grey solid line shows the result of NRW by measurement, and the estimated NRW ratio of each DMA is shown when the number of neurons in the hidden layer is set to 10, 20 and 30, respectively.The measured NRW ratio was 0.5-58.9percent, while the NRW ratio by ANN was estimated to be within 0.

Estimation of NRW Ratio Using ANN with Outlier Removal Case
The Z-score method can be used to distinguish the difference and distribution of the data used when conducting the result analysis.The Z-score is a dimensionless quantity obtained by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation [31].This conversion process is called standardizing or normalizing.The mean and standard deviation are used to determine how far the data deviate from the average when the standard deviation is taken as a unit, and the method of Z-score is shown in Equation ( 6).

= − (6)
where μ is mean of the population and σ is the standard deviation.The outlier can be estimated through the Z-score method.The mean of the standardized Z-scores calculated is 0, and the standard deviation is 1.As a result, values above ±3 are considered far away from the mean.In this study, the analysis was performed after excluding the DMA data for the parameter with the absolute value of the standardized Z-score of 3 or more among the main parameters of water distribution systems.
Finally, 122 sets of DMA data satisfying the Z-score among 135 sets of data were selected and used in the ANN analysis.Figure 6 show the results of the NRW ratio derived from the ANN estimated after excluding the abnormal value by the Z-score.

Estimation of NRW Ratio Using ANN with Outlier Removal Case
The Z-score method can be used to distinguish the difference and distribution of the data used when conducting the result analysis.The Z-score is a dimensionless quantity obtained by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation [31].This conversion process is called standardizing or normalizing.The mean and standard deviation are used to determine how far the data deviate from the average when the standard deviation is taken as a unit, and the method of Z-score is shown in Equation (6).
where µ is mean of the population and σ is the standard deviation.The outlier can be estimated through the Z-score method.The mean of the standardized Z-scores calculated is 0, and the standard deviation is 1.As a result, values above ±3 are considered far away from the mean.In this study, the analysis was performed after excluding the DMA data for the parameter with the absolute value of the standardized Z-score of 3 or more among the main parameters of water distribution systems.
Finally, 122 sets of DMA data satisfying the Z-score among 135 sets of data were selected and used in the ANN analysis.Figure 6 show the results of the NRW ratio derived from the ANN estimated after excluding the abnormal value by the Z-score.

Estimation of NRW Ratio Using ANN with Outlier Removal Case
The Z-score method can be used to distinguish the difference and distribution of the data used when conducting the result analysis.The Z-score is a dimensionless quantity obtained by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation [31].This conversion process is called standardizing or normalizing.The mean and standard deviation are used to determine how far the data deviate from the average when the standard deviation is taken as a unit, and the method of Z-score is shown in Equation (6).

= −
where μ is mean of the population and σ is the standard deviation.The outlier can be estimated through the Z-score method.The mean of the standardized Z-scores calculated is 0, and the standard deviation is 1.As a result, values above ±3 are considered far away from the mean.In this study, the analysis was performed after excluding the DMA data for the parameter with the absolute value of the standardized Z-score of 3 or more among the main parameters of water distribution systems.
Finally, 122 sets of DMA data satisfying the Z-score among 135 sets of data were selected and used in the ANN analysis.Figure 6 show the results of the NRW ratio derived from the ANN estimated after excluding the abnormal value by the Z-score.ANN (30) showed the highest accuracy among all results, and ANN (20) of the original data represented the least biased NRW ratio.Figure 7 shows the results of a scatter plot analysis of original data without using the Z-score method.The R 2 of the ANN model with 20 hidden layers was 0.3663 and the correlation coefficient was higher than the ANN model with 10 or 30 hidden layers and multiple regression analysis.These are the same results in Table 4 and the ANN model with 20 hidden neurons seems highly accurate.ANN (30) showed the highest accuracy among all results, and ANN (20) of the original data represented the least biased NRW ratio.Figure 7 shows the results of a scatter plot analysis of original data without using the Z-score method.The of the ANN model with 20 hidden layers was 0.3663 and the correlation coefficient was higher than the ANN model with 10 or 30 hidden layers and multiple regression analysis.These are the same results in Table 4 and the ANN model with 20 hidden neurons seems highly accurate.Figure 8 shows the results after excluding the abnormal values using the Z-score method.The accuracy of the ANN model was found to be the most accurate under the condition of 30 hidden layers.An of 0.476 denotes high similarity than other neuron cases.In the case of the ANN model, six cases were used to estimate the NRW ratio, and the accuracy was high or low depending on the number of hidden layers compared with the multiple regression equation comparing the previous research [24].Figure 8 shows the results after excluding the abnormal values using the Z-score method.The accuracy of the ANN model was found to be the most accurate under the condition of 30 hidden layers.An R 2 of 0.476 denotes high similarity than other neuron cases.In the case of the ANN model, six cases were used to estimate the NRW ratio, and the accuracy was high or low depending on the number of hidden layers compared with the multiple regression equation comparing the previous research [24].

Conclusions
The present study developed a model for estimating the NRW ratio using an ANN based on specific parameters affecting leaks in the water distribution systems of Incheon.Accuracy assessment and scatter plot analysis were used to select the optimal ANN model cases.The following conclusions were therefore drawn.
First, the estimation model for the NRW ratio was developed by an ANN in the water distribution systems of Incheon.In comparison with the multiple regression equation, the ANNestimated NRW ratio was more accurate when the appropriate number of hidden layers was applied.Improvement of about 40 percent occurred compared with the NRW ratio derived from a multiple regression equation.This proves that the selected parameters such as water supply quantity per demand junction, deteriorated pipe ratio and demand energy ratio are valid for estimating the NRW.
Second, analysis of the outlier of independent variables is crucial when applying the ANN model.If the NRW ratio was applied to the ANN model by eliminating the outlier data through the Z-score method, the results of the NRW ratio would have been similar to the measured value than in cases in which the outlier data were not removed.The accuracy of NRW prediction can be improved through the accuracy and outlier verification of the collected data of each DMA Third, the optimal number of hidden layers is needed when estimating the NRW ratio via ANN.When developing the ANN model, this study set hidden layers with 10, 20 and 30 neurons.If the

Conclusions
The present study developed a model for estimating the NRW ratio using an ANN based on specific parameters affecting leaks in the water distribution systems of Incheon.Accuracy assessment and scatter plot analysis were used to select the optimal ANN model cases.The following conclusions were therefore drawn.
First, the estimation model for the NRW ratio was developed by an ANN in the water distribution systems of Incheon.In comparison with the multiple regression equation, the ANN-estimated NRW ratio was more accurate when the appropriate number of hidden layers was applied.Improvement of about 40 percent occurred compared with the NRW ratio derived from a multiple regression equation.This proves that the selected parameters such as water supply quantity per demand junction, deteriorated pipe ratio and demand energy ratio are valid for estimating the NRW.
Second, analysis of the outlier of independent variables is crucial when applying the ANN model.If the NRW ratio was applied to the ANN model by eliminating the outlier data through the Z-score method, the results of the NRW ratio would have been similar to the measured value than in cases in which the outlier data were not removed.The accuracy of NRW prediction can be improved through the accuracy and outlier verification of the collected data of each DMA.
Third, the optimal number of hidden layers is needed when estimating the NRW ratio via ANN.When developing the ANN model, this study set hidden layers with 10, 20 and 30 neurons.If the number of hidden layers is set up with more detailed numbers, however, more accurate results from an ANN can be expected.
The estimation model for the NRW ratio developed through this study can be applicable to the water distribution systems of Incheon.The development model is expected to help set the direction of improvement of the analysis of water distribution systems and the optimal operation of water supply and waterworks facilities for the construction of DMAs in Incheon.The model can also help enhance the revenue water ratio and diagnostic operation of water distribution systems.

Figure 3 .
Figure 3. Simulated demand energy ratio in the DMA of Incheon.

Figure 3 .
Figure 3. Simulated demand energy ratio in the DMA of Incheon.

Figure 4 .
Figure 4. ANN model for analyzing the NRW ratio.

Figure 4 .
Figure 4. ANN model for analyzing the NRW ratio.
Figure5is the NRW ratio derived from ANN.The grey solid line shows the result of NRW by measurement, and the estimated NRW ratio of each DMA is shown when the number of neurons in the hidden layer is set to 10, 20 and 30, respectively.The measured NRW ratio was 0.5-58.9percent, while the NRW ratio by ANN was estimated to be within 0.5-49.1 percent.The mean error rate was 18.4 percent for the measured NRW ratio and 19.3, 18.0 and 20.4 percent for the 10, 20 and 30 hidden layers, respectively.And the multiple regression equation showed the closest value of 18.5 percent.Sustainability 2017, 9, 1193 10 of 15 18.4 percent for the measured NRW ratio and 19.3, 18.0 and 20.4 percent for the 10, 20 and 30 hidden layers, respectively.And the multiple regression equation showed the closest value of 18.5 percent.

Figure 5 .
Figure 5. NRW ratio by artificial neural network (ANN) model simulation in each DMA.

Figure 6 .
Figure 6.NRW ratio by ANN model simulation in each DMA with outlier removal condition.

Figure 5 .
Figure 5. NRW ratio by artificial neural network (ANN) model simulation in each DMA.
for the measured NRW ratio and 19.3, 18.0 and 20.4 percent for the 10, 20 and 30 hidden layers, respectively.And the multiple regression equation showed the closest value of 18.5 percent.

Figure 5 .
Figure 5. NRW ratio by artificial neural network (ANN) model simulation in each DMA.

Figure 6 .
Figure 6.NRW ratio by ANN model simulation in each DMA with outlier removal condition.

Figure 6 .
Figure 6.NRW ratio by ANN model simulation in each DMA with outlier removal condition.

Figure 7 .
Figure 7. Scatter analysis results of NRW ratio: (a) ANN using 10 neurons in hidden layer; (b) ANN using 20 neurons in hidden layer; (c) ANN using 30 neurons in hidden layer; (d) Equation using multiple regression analysis.

Figure 7 .
Figure 7. Scatter analysis results of NRW ratio: (a) ANN using 10 neurons in hidden layer; (b) ANN using 20 neurons in hidden layer; (c) ANN using 30 neurons in hidden layer; (d) Equation using multiple regression analysis.

Figure 8 .
Figure 8. Scatter analysis results of NRW ratio with outlier remove condition: (a) ANN using 10 neurons in hidden layer; (b) ANN using 20 neurons in hidden layer; (c) ANN using 30 neurons in hidden layer.

Figure 8 .
Figure 8. Scatter analysis results NRW ratio with outlier remove condition: (a) ANN using 10 neurons in hidden layer; (b) ANN using 20 neurons in hidden layer; (c) ANN using 30 neurons in hidden layer.

Table 2 .
Correlation analysis of main parameters.

Table 3 .
Data of parameters related to NRW ratio in water distribution systems.

Table 4 .
Results of Multiple regression analysis using all parameters.

Table 5 .
Results of multiple regression analysis using three parameters.