Optimal Selection and Monitoring of Nodes Aimed at Supporting Leakages Identiﬁcation in WDS

: Many efforts have been made in recent decades to formulate strategies for improving the efﬁciency of water distribution systems (WDS), led by the socio-demographic evolution of modern society and the climate change scenario. The improvement of WDS management is a complex task that can be addressed by providing services to maximize revenues while ensuring that the quality standards required by national and international regulations are upheld. These two objectives can be fulﬁlled by utilizing optimized techniques for the operational and maintenance strategies of WDS. This paper proposes a methodology for assisting engineers in identifying water leakages in WDS, thus providing an effective procedure for ensuring high level hydraulic network functionality. The proposed approach is based on an inverse analysis of measured ﬂow rates and pressure data, and consists of three steps: The analysis of measurements to select the most suitable period for leakage identiﬁcation, the localization of the best measurement points based on a correlation analysis, and leakage identiﬁcation with a hybrid optimization that combines the exploration capability of the differential evolution algorithm with the rapid convergence of particle swarm optimization. The proposed procedure is validated on a reference hydraulic network, known as the Apulian network.


Introduction
The maximization of the efficiency of water distribution systems (WDS) is a topic that is receiving increasing attention from research and industrial institutions due to continuous demographic growth and a worsening of the water crisis [1]. The improvement of WDS has been a challenging task due to a lack of structured and consistent measurement data [2]. However, these systems have been updated recently with advanced data logging systems and intelligent management strategies [3][4][5]. Such actions are aimed not only at increasing the economic profits of water-related services, but also at improving the quality of the service in line with national and international standards. One of the tricky challenges in the management of WDS at the moment is the minimization of water leakages, which represents not only a mass, energy and economical loss, but is also one of the main factors of water contamination [6,7]. Different definitions exist for leakage in water distribution systems. One of the most frequently used defines leakage as the quantity of water which escapes from the pipe network by means other than through a controlled action [8]. Leakages are typically classified into two different categories: Background leakages and bursts. The former are due to water escaping through inadequate joints or The proposed methodology was tested and validated on the Apulian network [24] which is a synthetic network consisting of 34 pipes, 23 users, and 1 reservoir (Figure 1).
Each user node was suitable for the insertion of a flow meter while emitter nodes were placed in the middle of each pipe of the network in order to evaluate pipe leakages. The location and the amount of fictitious leakages are shown in Figure 1 through the large circles and the corresponding value, which were chosen according to Debiasi et al. [25]. In order to minimize error in the inverse analysis and to make this analysis consistent with a real case study, the hours in which the WDS presented a minor perturbation were identified (i.e., when the system had a maximum sensitivity to losses), and the best location for placing a sensor was investigated.

Characterization of the Demand Pattern
The real data of water consumptions in two small municipalities in the Trentino-Alto Adige region of Italy were analyzed. We were then able to build the demand pattern of the Apulian network by analyzing the data measured in real cases. In fact, the hourly pattern used in the Apulian network was the average of the two municipalities' demand behavior, and the resulting weight function was adopted for leak detection. Figure 2a,b show the statistics of hourly consumptions recorded during a year of measurements. The demand and pressure of the two networks were analyzed to identify the more suitable hours for the inverse analysis of leak detection. The hours characterized by a high variability of water demand were not sufficiently stable or representative of the global behavior of the WDS, so low importance was associated to such hours, which were usually daytime hours. On the contrary, higher weights were assigned to the night time hours, which were characterized by limited variability. Figure 2c shows the weight associated to each hour, calculated as the inverse of the correspondent water demand variance. Each user node was suitable for the insertion of a flow meter while emitter nodes were placed in the middle of each pipe of the network in order to evaluate pipe leakages. The location and the amount of fictitious leakages are shown in Figure 1 through the large circles and the corresponding value, which were chosen according to Debiasi et al. [25]. In order to minimize error in the inverse analysis and to make this analysis consistent with a real case study, the hours in which the WDS presented a minor perturbation were identified (i.e., when the system had a maximum sensitivity to losses), and the best location for placing a sensor was investigated.

Characterization of the Demand Pattern
The real data of water consumptions in two small municipalities in the Trentino-Alto Adige region of Italy were analyzed. We were then able to build the demand pattern of the Apulian network by analyzing the data measured in real cases. In fact, the hourly pattern used in the Apulian network was the average of the two municipalities' demand behavior, and the resulting weight function was adopted for leak detection. Figure 2a,b show the statistics of hourly consumptions recorded during a year of measurements. The demand and pressure of the two networks were analyzed to identify the more suitable hours for the inverse analysis of leak detection. The hours characterized by a high variability of water demand were not sufficiently stable or representative of the global behavior of the WDS, so low importance was associated to such hours, which were usually daytime hours. On the contrary, higher weights were assigned to the night time hours, which were characterized by limited variability. Figure 2c shows the weight associated to each hour, calculated as the inverse of the correspondent water demand variance.

Measurement Points Identification
Since the number of sensors that can be displaced in a real network is limited, it is important to place them in optimal locations. In this study, during the identification of the best measurement points, it is assumed that only four pressure sensors are usable in the WDS. The challenge was to select the four measurement nodes (users) that both maximize the sensitivity of the pressure meters with respect to the leakages, and provide uncorrelated measurements. These two requirements were fulfilled by a sensitivity analysis and a subsequent correlation analysis of the WDS. To this aim, the procedure followed the approach developed by Bort et al. [19], as shown in the algorithm scheme in Figure 3.

Measurement Points Identification
Since the number of sensors that can be displaced in a real network is limited, it is important to place them in optimal locations. In this study, during the identification of the best measurement points, it is assumed that only four pressure sensors are usable in the WDS. The challenge was to select the four measurement nodes (users) that both maximize the sensitivity of the pressure meters with respect to the leakages, and provide uncorrelated measurements. These two requirements were fulfilled by a sensitivity analysis and a subsequent correlation analysis of the WDS. To this aim, the procedure followed the approach developed by Bort et al. [19], as shown in the algorithm scheme in Figure 3.

Measurement Points Identification
Since the number of sensors that can be displaced in a real network is limited, it is important to place them in optimal locations. In this study, during the identification of the best measurement points, it is assumed that only four pressure sensors are usable in the WDS. The challenge was to select the four measurement nodes (users) that both maximize the sensitivity of the pressure meters with respect to the leakages, and provide uncorrelated measurements. These two requirements were fulfilled by a sensitivity analysis and a subsequent correlation analysis of the WDS. To this aim, the procedure followed the approach developed by Bort et al. [19], as shown in the algorithm scheme in Figure 3.   This methodology was based on two stages. The first analysis related to the sensitivity analysis of the user nodes at leakages. A sensitivity matrix was calculated for each hour of the day, by placing a known leakage into one pipe and recording the pressure measured on every user node. The leaking flow was set as a ratio of the nominal flow (where no leakages are placed into the network). Three ratios were tested: 0.5, 1, and 2, in order to consider the effect of the leakage magnitude. The emitter coefficient was calculated from the imposed leaking flow as follows: where p i is the pressure, and γ is the pressure exponent set to 0.5. The sensitivity matrix relative to the h-th hour is called ∑ h and it has as many rows as the number of user nodes (i.e., 23), and as many columns as the number of pipe nodes (i.e., 34). Each element of the 24 sensitivity matrices was called p% m,l,h , and this represents the percentage of variation of the pressure at the measurement node m with respect to the nominal case when a leakage was placed into the pipe l and the measurement was carried out at hour h. Suitable performance indexes are defined on each element of the matrix in order to reduce the dimensions of the 24 matrices ∑ h to four values for user nodes [19]. The four indexes are as follows: 1. The mean f 1,m of the mean percentage pressure variations across different positions of the leaking pipe p% m,h . These quantities are calculated as: 2. The variance of p% m,h across the day: 3. The mean across the whole day of σ 2 m,h , that is defined as the variance of p% m,l,h across the different position of the leakages: 4. The variance across the whole day of σ 2 m,h : Once the four features were calculated, a principal component analysis (PCA) was carried out in order to extract the most sensitive from between m-nodes. Figure 4 shows the principal component analysis carried out on the sensitivity study results.
Once the four features were calculated, a principal component analysis (PCA) was carried out in order to extract the most sensitive from between -nodes. Figure 4 shows the principal component analysis carried out on the sensitivity study results. By definition of the calculated features, only the positive principal components were associated to a decreasing in the measurement pressure due to a leaking pipe. For such a reason, the negative values of principal components were neglected and rounded to zero. The four most sensitive measurement nodes found from this analysis were numbers 23, 12, 13, and 21. Figure 5A shows that these sensitive nodes are close to each other, therefore, redundant information was used for the inverse analysis if the pressure sensors were placed on the most sensitive nodes.  By definition of the calculated features, only the positive principal components were associated to a decreasing in the measurement pressure due to a leaking pipe. For such a reason, the negative values of principal components were neglected and rounded to zero. The four most sensitive measurement nodes found from this analysis were numbers 23, 12, 13, and 21. Figure 5A shows that these sensitive nodes are close to each other, therefore, redundant information was used for the inverse analysis if the pressure sensors were placed on the most sensitive nodes. The second stage aimed to overcome this limitation through a correlation analysis, following the scheme in Figure 6. The percentage variations % , , of the measured pressure with respect to the non-leaking cases were averaged across the hours of the day, thus obtaining a 23 33 matrix P whose elements were the daily average of percentage of the pressure variations. According to the considerations made from the previous PCA analysis, all the rows of associated to nodes with negative principal components were neglected. Therefore, the matrix is reduced to a dimension 9 33. The correlation matrix was defined as: C = .
(8) The most uncorrelated points were obtained by solving a least square optimisation (LSQ), that was defined as: where is a vector with 9 elements, in which the -th element is associated to the candidate measurement node with the -th 1st PCA component. The following constraints were associated with the target function: • The -th element of was close to 1 if it was uncorrelated from other nodes (it will host a pressure sensor), 0 otherwise: The node 23, that is the most sensitive measurement node, was always taken as a measurement node: = 1. The second stage aimed to overcome this limitation through a correlation analysis, following the scheme in Figure 6. The percentage variations p% m,l,h of the measured pressure with respect to the non-leaking cases were averaged across the hours of the day, thus obtaining a 23 × 33 matrix P whose elements were the daily average of percentage of the pressure variations. According to the considerations made from the previous PCA analysis, all the rows of P associated to nodes with negative principal components were neglected. Therefore, the matrix P is reduced to a dimension 9 × 33. The correlation matrix C was defined as: The most uncorrelated points were obtained by solving a least square optimisation (LSQ), that was defined as: min X T |C|X, where X is a vector with 9 elements, in which the j-th element is associated to the candidate measurement node with the j-th 1st PCA component. The following constraints were associated with the target function: • The j-th element of X was close to 1 if it was uncorrelated from other nodes (it will host a pressure sensor), 0 otherwise: • The node 23, that is the most sensitive measurement node, was always taken as a measurement node: The outcome of this analysis was the four nodes 23, 16, 13, and 4, which were not only sensible to the position of leakages in the network, but also uncorrelated from each other. The position of these four measurement nodes is shown in Figure 5B.

Leakage Detection
The inverse analysis for the leak detection was formulated as an optimization problem. The emitter coefficients of all pipes nodes were calibrated until the pressures of the four selected measurement nodes and the flow rate supplied by the reservoir simulated by the widely used hydraulic simulator EPANET [26] matched the pressures and flow rate measured. The latter is the theoretical Apulian network in which four leakages were imposed according to Figure 1. The formulation of the optimization problem is shown by the following equation: where = [ , , , ] is the vector of emitter coefficient, are the hourly weights calculated from step 1 (Figure 2c), , ( ) is the pressure simulated at the -th measurement node and recorded at -th hour (note the dependency of the pressure from the value of the emitter coefficients ), ( ) is the simulated water flow leaving the reservoir relative to the -th hour, while , and are the reference pressure at measurement nodes and flow rate respectively. Note that it is assumed that only four nodes in the network were leaking since the pressure was measured on only four nodes, otherwise the optimisation problem would be ill-conditioned.
Three constraints are applied to the target function. Sources of uncertainty on the moder of the hydraulic network were taken into account by accepting that the global emitter ( ) could vary up to 10%: ( ) − 9 ( ) < 9

Leakage Detection
The inverse analysis for the leak detection was formulated as an optimization problem. The emitter coefficients k j of all pipes nodes were calibrated until the pressures of the four selected measurement nodes and the flow rate supplied by the reservoir simulated by the widely used hydraulic simulator EPANET [26] matched the pressures and flow rate measured. The latter is the theoretical Apulian network in which four leakages were imposed according to Figure 1. The formulation of the optimization problem is shown by the following equation: where k = [k 1 , k 2 , k 3 , k 4 ] is the vector of emitter coefficient, w t are the hourly weights calculated from step 1 (Figure 2c),Ĥ i,t (k) is the pressure simulated at the i-th measurement node and recorded at t-th hour (note the dependency of the pressure from the value of the emitter coefficients k),Q t (k) is the simulated water flow leaving the reservoir relative to the t-th hour, while H i,t and Q t are the reference pressure at measurement nodes and flow rate respectively. Note that it is assumed that only four nodes in the network were leaking since the pressure was measured on only four nodes, otherwise the optimisation problem would be ill-conditioned. Three constraints are applied to the target function. Sources of uncertainty on the moder of the hydraulic network were taken into account by accepting that the global emitter K glob (k) could vary up to 10%: Each emitter coefficient k i was constrained to be positive, but limited to a maximum value. The upper limit was set by assuming that in the worst case the total leakage of the network was concentrated in one node. Such a condition corresponds to a maximum emitter coefficient k max = 6 l/sm γ : The complete target function is obtained by combining Equation (12) with Equations (13)- (15): The optimization algorithm used for minimizing the target function in Equation (16) was the differential evolution particle swarm (DEPSO), which is a hybrid evolutive algorithm in which a population (i.e., a group) of individuals (i.e., candidate set of emitter coefficients) is evolved until the target function is minimized. Each population had 50 individuals, and each individual was a set of 8 variables. The first four elements of an individual defined the number of the four leaking nodes (the number of leaking nodes matches the number of nodes in which pressure was measured in order to avoid ill-conditioned problems), while the remaining four elements defined the emitter coefficient of each leakage.
The individuals of the first population were generated by randomly sampling the emitter coefficients from a uniform distribution. The evolution then proceeded by alternating an iteration with the differential evolution (DE) [27] with an iteration with the particle swarm optimization (PSO). At the k-th iteration step, each individual of the population P k was updated with the best population P best , which was the population grouping of the best individuals calculated during the whole optimization process. The values of the target functions relative to the elements in P best are stored into J tot,best .
Then, the mutation operation was performed according to a random scheme. Three array of indexes I 1 , I 2 , and I 3 were randomly generated from a uniform distribution, and with as many elements as individuals in the population (that in this study is 66). The population P k,m was given by: where the scaling factor w f is equal to 0.6 (this value has been chosen after a manual calibration). The cross-over operation was applied to P k,m with a probability of 60%. For each i-th individual P i,k,m in P k,m , a vector r i was produced of 8 randomly generated numbers sampled from a uniform distribution defined in the interval [0, 100]. The indexes corresponding to the elements of r i ≥ 60 identify which elements of P i,k,m were replaced with the correspondent elements of the individual P i,best and P best .
At the end of the cross-over step, a new population P k,mr was obtained and the target function was evaluated for each individual of P k,mr . The DE step ended by creating a population P k,de assembled by merging the best individuals of P k,mr and P best .
At this point it was applied a PSO step to P k,de , according to the algorithm proposed in Clerc & Kennedy [28]. It is defined a constriction factor C: where c s = 1.3 is the social acceleration coefficient and c c = 2.8 is the cognitive acceleration coefficient. The values of the two parameters were selected as suggested in the literature [29] and by assuring that c s + c c ≤ 4. Then the velocities of each i-th individual in P k,de were updated: P g,best is the best individual in the P best population, while r r and r c represent uniform random numbers between 0 and 1. The velocities v k were used to update the current population: The target function was evaluated for each individual in P k,pso , and the relative values of the target function were stored in J k . The optimisation was stopped if the difference between the maximum and minimum values in J k was less than 10 −5 , or after 3000 iterations. Otherwise, the next iteration was performed starting again from step 1.

Results
In order to take into account the intrinsic stochasticity of evolutive algorithms, the identification was performed 10 times. The best result of the 10 identifications is shown in Figure 7. On the top left the convergence plot is shown on a logarithmic scale. Convergence was reached after 253 iterations. The best set of emitter coefficients is shown on the top right and it is compared to the a-priori imposed leakages. It is possible to observe how the identified leakages match not only their position, but also their nominal values.
Emitter coefficients across the best population are shown in the lower panel: The abscissa shows the individuals, and the ordinate shows the number of leaking pipes, while color is associated with the magnitude of the emitter coefficient. This colored diagram shows the stability of the identified solution: Horizontal streams of colors indicate that a given pipe number was recognized as leaking for multiple individuals in the population. Due to the intrinsic stochasticity behind the DEPSO optimization algorithm, emitter coefficients, that were more frequent across the individuals of the population, were more likely be closer to the exact solution.
The performance of the proposed methodology was strongly dependent on the initial guest provided to the DEPSO algorithm. In order to achieve a robust identification, increasing the number of identification trials while reducing the number of iterations was recommended. By doing this, the exploration of a large space of the optimization domain was guaranteed.
In order to take into account the intrinsic stochasticity of evolutive algorithms, the identification was performed 10 times. The best result of the 10 identifications is shown in Figure 7. On the top left the convergence plot is shown on a logarithmic scale. Convergence was reached after 253 iterations. The best set of emitter coefficients is shown on the top right and it is compared to the a-priori imposed leakages. It is possible to observe how the identified leakages match not only their position, but also their nominal values. Emitter coefficients across the best population are shown in the lower panel: The abscissa shows the individuals, and the ordinate shows the number of leaking pipes, while color is associated with the magnitude of the emitter coefficient. This colored diagram shows the stability of the identified solution: Horizontal streams of colors indicate that a given pipe number was recognized as leaking for multiple individuals in the population. Due to the intrinsic stochasticity behind the DEPSO optimization algorithm, emitter coefficients, that were more frequent across the individuals of the population, were more likely be closer to the exact solution.
The performance of the proposed methodology was strongly dependent on the initial guest provided to the DEPSO algorithm. In order to achieve a robust identification, increasing the number of identification trials while reducing the number of iterations was recommended. By doing this, the exploration of a large space of the optimization domain was guaranteed.

Conclusions
A methodology suitable for supporting technicians and engineers in locating and quantifying leakages in a water distribution system has been proposed. The suggested methodology was based

Conclusions
A methodology suitable for supporting technicians and engineers in locating and quantifying leakages in a water distribution system has been proposed. The suggested methodology was based on three innovative concepts: A statistical analysis of water consumption, a sensitive and correlation analysis, and an inverse identification through a hybrid evolutive optimization algorithm.
The statistical study of water consumption trends during a one-year period allowed us to determine the hours of the day that were characterized by more repeatable consumption patterns, which could therefore provide accurate measurements. Such an approach extends the applicability of the proposed procedure to a wide category of WDS, even characterized by intermittent water consumptions. The sensitivity and correlation analyses of the hydraulic network suggested which the optimal nodes of the network were, and where to install pressure sensors, suitable for the health monitoring of the WDS. Finally, the inverse identification attempted to reconstruct the set of emitter coefficients (i.e., leakages) that generated the measured pressures.
The proposed approach was validated by a theoretical network that is characterized by the complete knowledge of topological and hydraulic properties.
The network was monitored by four pressure sensors which were strategically located, and the leakages were estimated to be half way along four pipes. The result of the inverse optimization shows that with a reduced number of pressure sensors it is possible to estimate the position of the leakages on a restricted subset of pipes in the network. Due to the stochasticity of evolutive algorithms, multiple identifications had to be run in order to achieve a robust solution. However, each identification needed to be run for a low number of iterations since the initial guest appeared to be the major factor in finding the global optimum. It had to be mentioned, moreover, that the procedure was carried out on a low-power personal computer, and this could have been overcome by exploiting parallelized implementations of the DEPSO algorithm to be executed on distributed processors, such as on graphical processing units (GPU).
Finally, it should be mentioned that one of the advantages of the proposed approach is that it is scalable. Large WDSs can be sectioned into smaller WDS if the boundary conditions are set properly (e.g., by monitoring the nodes at the frontier of the sub-WDS with pressure and flow sensors).