A Hybrid Water Distribution Networks Design Optimization Method Based on a Search Space Reduction Approach and a Genetic Algorithm

This work presents a new approach to increase the efficiency of the heuristics methods applied to the optimal design of water distribution systems. The approach is based on reducing the search space by bounding the diameters that can be used for every network pipe. To reduce the search space, two opposite extreme flow distribution scenarios are analyzed and velocity restrictions to the pipe flow are then applied. The first scenario produces the most uniform flow distribution in the network. The opposite scenario is represented by the network with the maximum flow accumulation. Both extreme flow distributions are calculated by solving a quadratic programming problem, which is a very robust and efficient procedure. This approach has been coupled to a Genetic Algorithm (GA). The GA has an integer coding scheme and variable number of alleles depending on the number of diameters comprised within the velocity restrictions. The methodology has been applied to several benchmark networks and its performance has been compared to a classic GA formulation with a non-bounded search space. It considerably reduced the search space and provided a much faster and more accurate convergence than the GA formulation. This approach can also be coupled to other metaheuristics.


Introduction
The optimal design of looped water distribution networks (WDN) can be regarded as a type of complex combinatorial problem known as NP-hard (Non-deterministic Polynomial-time hard), as it is a nonlinear, constrained, non-smooth, non-convex, and, hence, multi-modal problem [1,2]. Although mathematical programming methods such as linear and nonlinear programming techniques [3,4] have been applied to solve this problem, metaheuristics methods have been preferred due to their ability to cope with global optimization problems. Genetic Algorithms (GA) and other Evolutionary Algorithms [2], Simulated Annealing (SA) [5], Shuffle Frog Leaping Algorithm [6], Iterated Local Search [7] and Particle Swarm Optimization [8] are among the most extended metaheuristic approaches applied to water distribution networks design. Genetic algorithms have been extensively applied to solve the problem of designing the optimal water distribution network ( [2,9]). GAs are based on the rules of evolution and natural selection. Multi-objective heuristic approaches have also been formulated not only to minimize the network cost but to take into consideration other conflicting objectives as well, such as the reliability of the system [10][11][12][13][14].
Although heuristics approaches can handle global optimization problems, they do not guarantee to find the optimal solution [15]. In addition to the lack of accuracy of the solution provided, another shortcoming of these procedures is the time they take to converge. In recent years, a great deal of research work has been carried out to improve their performance; however, in spite of these efforts, heuristics methods are still relatively inefficient and time-consuming when dealing with very large water distribution networks. This inefficiency is due to the wide search space that these algorithms must explore. Since the search space is very large, general purpose heuristic algorithms waste a considerably long time evaluating unfeasible solutions. Consequently, the probability of finding the optimal solution decreases and the convergence speed increases as the size of the search space increases. Strategies for reducing the search space are thus greatly needed.
The aim of this paper is to present a new approach to increase the efficiency of the heuristics methods applied to the optimal design of water distribution networks. The proposed approach is based on bounding the search space by analyzing two opposite extreme flow distribution scenarios and then applying velocity restrictions to the flow in the network's links. The proposed methodology has been applied to minimize the cost of a well-known benchmark network. The performance of the approach presented in this paper has been compared to a classic GA formulation with a non-bounded search space.

Bounding Strategy
Flow distribution can be calculated in branched networks by applying flow conservation equations in the nodes of the network. From a practical standpoint, a common procedure for this type of network is to impose velocity restrictions on the flow in the pipes. Velocity limits of piping systems can vary depending on the material and diameter and other considerations. High velocities may cause pipe erosion, loud noise, and excessive head losses. Low velocities, on the contrary, may produce sedimentation and oversizing of the system. When velocity restrictions are applied, the range of possible diameters that can be selected for a specific pipe is considerably reduced, thus simplifying the complexity of the network design. However, unlike the case of branched networks, the flow distribution in looped networks is not known a priori, and, as a consequence, this procedure cannot be used.
The methodology proposed in this work is based on reducing the search space by bounding the range of possible diameters that can be selected for a specific network link. The procedure consists of generating two opposite extreme flow distribution scenarios that satisfy the nodal flow conservation equations and nodal demands. The first scenario produces the most uniform flow distribution in the network while satisfying nodal demands and flow conservation constraints. The resulting design would provide a network with high entropy and resilience. The opposite scenario features the highest flow accumulation within certain main pipes. This scenario provides a flow distribution fairly similar to the one obtained for a spanning tree of the network.
The methodology proposed in this work to calculate both extreme flow distributions is to solve a quadratic programming problem (QPP) for each of them. A quadratic programming problem involves minimizing or maximizing a quadratic function subject to linear constraints. Quadratic programming is a particular type of nonlinear programming. Although general nonlinear algorithms can be applied to solve this type of problem, there are others that are more robust, specific and efficient [16].
The objective function of the proposed QPP for the most uniform flow distribution is to minimize the sum of the square link flows of the network. These link flows have to satisfy the flow conservation equations in the nodes of the network. This set of restrictions is linear. The problem is formulated in Equation (1): where: n is the number of links in the network, Q i is the flow of the link i, A is an (m × n) array, and m is the number of nodes, and q is a vector of nodal demands. The entry a ij of array A is 1 if the flow of link j goes into node i, −1 if it leaves the node, and 0 if link j is not connected to node i.
One drawback of this procedure is that the direction of the flows has to be previously defined in order to perform the calculation. For complex networks, the number of possible flow direction combinations can be very high and finding the right one is a cumbersome procedure. To overcome this limitation, we have duplicated the number of links by adding a fictitious pipe for each link of the network in such a way that two pipes with opposed flows are considered for each link of the network. Using this procedure, the number of variables is 2n and array A is a (m × 2n) array. The solution of the minimization QPP problem provides the right flow directions and values that minimize the sum of network flows. The so called Maximum Dispersion (MD) flow distribution is obtained in this way.
The second scenario, with a maximum flow accumulation, also termed a Maximum Concentration scenario (MC), is obtained by maximizing the objective function and solving the equivalent QPP maximization problem.
The solution of these two problems provides two vectors flows (Q MC and Q MD ), which bounds the range of possible flows within each network link. By imposing velocity restrictions, a pair of vectors defining the range of possible diameters between the minimum (D m,i ) and the maximum (D M,i ) for each link i can be calculated in the following way:

Bounded Genetic Algorithm Formulation
The bounding approach developed herein can be coupled to different types of metaheuristics methods. In this work, a GA has been used to test the performance of the proposed methodology. GAs are stochastic search procedures based on the evolutionary mechanisms of natural selection and genetics [17]. GAs mimic the highly effective optimization model that has naturally evolved for dealing with large, highly complex systems.
The GA is based on the GENOME model developed by Reca and Martínez [2]. However, some modifications in the code described below have been made to implement the proposed strategy. A new software program called B-GENOME (B-GA) has been developed to implement this new approach. The program has been developed using the VBA (Visual Basic for Applications) programming language in the Excel © (Microsoft, Redmond, Washington, DC, USA) spreadsheet environment. GENOME used an integer-coding scheme. Each solution (individual) was coded by a vector of n discrete variables (diameter sizes assigned to each link of the network). The variable was coded by an integer value ranging from one (first possible diameter for that particular link) to n d,i (last possible diameter). This methodology has many advantages since there are no limitations on the number of possible diameter sizes that can be assigned to a specific pipe. In the classic formulation of GENOME, the number of possible diameter sizes was equal for each link and this value was equal to the total number of diameters in the pipe database. The same coding scheme has been adopted in B-GENOME, although some modifications have been made to allow for a variable number of possible diameters for each link. The new B-GENOME algorithm used in this work has an integer coding scheme and a variable number of alleles. The number of alleles depends on the number of possible diameters comprised within the velocity restrictions of each link.
In order to test and compare the new approach to the classic GA formulation, the initial population has been obtained randomly. This initial population evolves from one generation to another by undergoing an iterative reproductive cycle. This cycle comprises three subsequent operators: selection, crossover and mutation. For the selection operator to be applied the fitness of each individual is evaluated as the sum of the cost of the pipes making up the network plus a penalty function applied to take into account nodal pressure deficits (see Equation (3)): where: c i is the pipe cost (€ m −1 ), which is a function of the diameter D i , L i is the length of the link i, p is a penalty multiplier, N is the number of nodes in the network, h rj is the required pressure head in the node j and h j is the actual pressure head computed by the hydraulic solver EPANET for the node j.
The value of the penalty multiplier may affect the accuracy of the solution, so it should be properly calibrated. To cope with this problem, some researchers recommend different constraint-handling techniques, such as the use of variable values or self-adaptive penalty functions [18]. However, in this work, for the sake of simplicity, and in order to compare both approaches under the same conditions, a constant penalty multiplier has been applied. A high value has been assigned to this penalty multiplier (10 9 €/m) to avoid finding solutions that violate the pressure restrictions. In order to compute the pressure deficits, the nodal pressures for each individual in the population have been computed by using a network solver. The hydraulic solver EPANET has been used for this purpose [19]. The EPANET engine is used when needed by calling on the EPANET toolkit from the VBA software code developed in this work. B-GENOME implements all the different options to perform the three basic operations that were available in GENOME [2].

Structure of B-GENOME
In addition to the GA module, B-GENOME implements a module to solve the QPPs. This module makes use of the EXCEL optimization add-on SOLVER (frontline systems). Both data input and results output modules complete the structure of the B-GENOME software model. The input module reads both the network information and the available pipe diameters database. The network information is imported from an EPANET input file (*.inp file). The pipe database is stored in a table within an Excel sheet. The output module writes the final solution found by the model (optimal vector of pipe diameters and cost of the network) and the best fitness function value for every generation in a spreadsheet.
The flowchart of the B-GENOME model is depicted in Figure 1.

Testing of the B-GENOME Model
The proposed methodology has been applied to 2 well-known benchmark networks in order to compare the performance of the classic GA formulation with the new bounded GA. The selected networks are the Alperovits and Shamir [3] network (A&S, also known as a two-loop network) and the Hanoi water distribution network. Both have been extensively used to test different water distribution design optimization algorithms, but they feature different size characteristics. While the first is a small network with seven nodes and eight pipes arranged in two loops, the latter can be considered as medium-sized, with 32 nodes and 34 pipes and 3 loops.
The A&S network layout is shown in Figure 2. The system is fed by gravity from a reservoir of 210 m fixed head. The pipes are all 1000 m long. The minimum pressure limitation is 30 m above ground level for each node. There are 14 commercial diameters to be selected. The nodal head and demands, the cost per meter for each pipe size and other data are reported by Alperovits and Shamir and other works [3].

Testing of the B-GENOME Model
The proposed methodology has been applied to 2 well-known benchmark networks in order to compare the performance of the classic GA formulation with the new bounded GA. The selected networks are the Alperovits and Shamir [3] network (A&S, also known as a two-loop network) and the Hanoi water distribution network. Both have been extensively used to test different water distribution design optimization algorithms, but they feature different size characteristics. While the first is a small network with seven nodes and eight pipes arranged in two loops, the latter can be considered as medium-sized, with 32 nodes and 34 pipes and 3 loops.
The A&S network layout is shown in Figure 2. The system is fed by gravity from a reservoir of 210 m fixed head. The pipes are all 1000 m long. The minimum pressure limitation is 30 m above ground level for each node. There are 14 commercial diameters to be selected. The nodal head and demands, the cost per meter for each pipe size and other data are reported by Alperovits and Shamir and other works [3].

Testing of the B-GENOME Model
The proposed methodology has been applied to 2 well-known benchmark networks in order to compare the performance of the classic GA formulation with the new bounded GA. The selected networks are the Alperovits and Shamir [3] network (A&S, also known as a two-loop network) and the Hanoi water distribution network. Both have been extensively used to test different water distribution design optimization algorithms, but they feature different size characteristics. While the first is a small network with seven nodes and eight pipes arranged in two loops, the latter can be considered as medium-sized, with 32 nodes and 34 pipes and 3 loops.
The A&S network layout is shown in Figure 2. The system is fed by gravity from a reservoir of 210 m fixed head. The pipes are all 1000 m long. The minimum pressure limitation is 30 m above ground level for each node. There are 14 commercial diameters to be selected. The nodal head and demands, the cost per meter for each pipe size and other data are reported by Alperovits and Shamir and other works [3].  The Hanoi water distribution network (see Figure 3) features 32 nodes and 34 pipes organized in 3 loops. No pumping facilities are considered since only a single fixed head source at elevation of 100 m is available. The minimum head requirement at all nodes is fixed at 30 m. In this case, there is a set of 6 commercially available diameters. The cost function is nonlinear. The pipe head losses were calculated using the Hazen-Williams equation with a Hazen-Williams roughness coefficient, C = 130.  Table 1. The steady-state-delete-worst plan inserts individuals as they are bred whenever its fitness exceeds that of the least fit member of the parent population. The least fit member of the parent population is removed and replaced by the offspring. The crossover operator implies that a pair of parent chromosomes exchanges information in order to produce a pair of offspring chromosomes that inherit their characteristics. The probability of crossing two chromosomes is defined by the input parameter pcross. In the uniform crossover, the parents' chromosomes exchange their genetic information gene to gene. The probability of exchanging genes is defined by the gene crossing rate (rcross). Ten simulations were performed both for the new bounded algorithm and the classic GA algorithm.  Table 1. The steady-state-delete-worst plan inserts individuals as they are bred whenever its fitness exceeds that of the least fit member of the parent population. The least fit member of the parent population is removed and replaced by the offspring. The crossover operator implies that a pair of parent chromosomes exchanges information in order to produce a pair of offspring chromosomes that inherit their characteristics. The probability of crossing two chromosomes is defined by the input parameter p cross . In the uniform crossover, the parents' chromosomes exchange their genetic information gene to gene. The probability of exchanging genes is defined by the gene crossing rate (r cross ). Ten simulations were performed both for the new bounded algorithm and the classic GA algorithm.

Results
The first step in the calculation procedure is to solve the QPPs stated in Equation (1). The results of these problems provide both vectors of flows for each link of the network. Both flow values represent the limits to the flow in each link of the network and thus reduce the search space. The results provided are given for both the A&S (Table 2) and the Hanoi networks (Table 3). Table 2 (A&S) and Table 3 (Hanoi) show the flow range, the minimum and maximum diameters, and the number of possible diameters compatible with the velocity restrictions provided by the QPP problems.     With the aim of evaluating the accuracy of the solution and the convergence speed of the new bounded algorithm and the classic GA algorithm, ten runs were performed for each algorithm with the same input parameters, data, and analysis options. Results of these simulations are shown in Figure 4 for the A&S network and in Figure 5 for the Hanoi network. With the aim of evaluating the accuracy of the solution and the convergence speed of the new bounded algorithm and the classic GA algorithm, ten runs were performed for each algorithm with the same input parameters, data, and analysis options. Results of these simulations are shown in Figure 4 for the A&S network and in Figure 5 for the Hanoi network.    With the aim of evaluating the accuracy of the solution and the convergence speed of the new bounded algorithm and the classic GA algorithm, ten runs were performed for each algorithm with the same input parameters, data, and analysis options. Results of these simulations are shown in Figure 4 for the A&S network and in Figure 5 for the Hanoi network.   Evolution of the best fitness value for B-GENOME and GENOME algorithms (Hanoi Network).
Not only did the B-GA algorithm outperform the GA in convergence speed, but it also performed better when it came to the accuracy of the solution. The solutions provided by both algorithms are given in Table 4 (A$S) and Table 5 (Hanoi).

Discussion
A significant reduction of the search space was achieved with the proposed methodology. Regarding the Alperovits and Shamir network, as its pipe database is composed of 14 different diameter values and the network has eight links, the total search space in the unbounded problem is equal to 14 8 = 1.48 × 10 9 possible network designs. The search space for the bounded problem is reduced to 4.61 × 10 7 , which means that the search space becomes approximately 3% of the total search space of the problem (see Table 2). In the case of the Hanoi network, the reduction is even higher. There are six possible diameters in the database and the number of links is equal to 34. The resulting number of alternative designs is 2.87 × 10 26 , whereas the size of the search space in the bounded problem is 4.35 × 10 16 (see Table 3). The reduction of the search space is expected to be higher for a larger number of links and the number of pipe diameters in a given problem. The velocity limits also play an important role as the search space reduction increases as the velocity limits range becomes narrower.
Another advantage of the search space reduction approach presented herein is that it is able to detect branched links in the network by comparing the flow value for these links in both flow distributions and checking if it is the same. For instance, this is the case of link 1 in the A&S network ( Table 2) and links 1, 2, 10, 11, 12, 21 and 22 in the Hanoi network (Table 3). Since the flow is established in these branched links, the range of possible diameters compatible with the flow velocity restrictions is considerably reduced and so the complexity of the problem. In addition, a special treatment applying other optimization methods best suited for branched networks can be performed in these branched sub-networks.
Regarding the speed of convergence, both algorithms performed well, although the new proposed algorithm B-GA outperformed the GA for both networks. It is worth highlighting that convergence was reached rather quickly in both cases. There were no substantial differences in the convergence speed for the A&S network (both algorithms converged approximately after the 20th generation). Convergence was found later for the Hanoi network due to the larger size of the problem, and, in this case, B-GA clearly converged faster than GA (130th for the B-GA and 293th in the case of GA). In the A&S network, only 2000 function evaluations were needed to converge (100 individuals and 20 generations). In the case of the Hanoi network, 26,000 evaluations were needed in the B-GA algorithm and 58,600 in the GA algorithm. This entails a very small fraction of the total search space (see Figures 4 and 5).
The proposed B-GA algorithm not only considerably reduced the search space, but also provided a much faster and more accurate convergence than the classic GA formulation In the case of the A&S network, the best solution found by the B-GA algorithm was 419,000, which is the global optimum as reported in previous works. This minimum cost was obtained in three out of the 10 runs performed. A quasi-optimal solution (420,000) was found in another five out of 10 iterations. The average cost in the 10 simulations was close to the optimum (424,000). The GA performed slightly worse; this algorithm did not reach the minimum cost in any simulation, although the solution found came very close (420,000). The average cost was also higher (430,900) than the one obtained by B-GA (see Tables 4 and 5).
Both algorithms found solutions relatively close to the global optimum. For the Hanoi network, the best solution found by the B-GA algorithm was 6,182,006. The average cost in the 10 simulations was close to the optimum (6,219,390). Again, the GA clearly performed worse. The minimum solution found was 6,208,937 (0.44% higher). The average cost was also higher (6,296,366) than the one obtained by B-GA (1.24%). The solution found by the proposed B-GA is comparable to the optimal solution found in the literature. The lowest cost solution reported is 6,056,000 [5,20]. However, these results were not obtained using the EPANET 2.0 network solver and the coefficients of the Hazen-Williams head loss equation were slightly different. The best solution found when using the EPANET 2.0 network solver was that reported by Lansey and Eusuff [6] (6,073,000) using the Shuffled Frog Leaping Algorithm (SFLA). This solution is slightly better than the one found in this work with B-GA. Nevertheless, it should be noted that, in our study, the number of evaluations was low because the aim was not to achieve the minimum cost but to test and compare the algorithm with a classic GA algorithm under the same conditions.
As a consequence, the proposed B-GA algorithm considerably reduced the search space and provided a much faster and more accurate convergence than the classic GA formulation. It is expected that, for more complex networks (networks with a higher number of links or higher number of pipe diameters), the advantages provided by the new B-GA approach could be even greater.
Another major advantage of the proposed search space reduction is that it can be coupled to other metaheuristics. The performance of this strategy when applied to other types of metaheuristics is an issue still to be investigated.

Conclusions
The following conclusions can be drawn from this research work:

•
A new approach based on bounding and reducing the total search space in a water distribution network design problem has been developed. This new approach reduces the search space by analyzing two opposite extreme flow distribution scenarios and then applying velocity restrictions to the pipes.

•
This new approach has been coupled to a GA in order to improve its performance.

•
The proposed B-GA algorithm considerably reduced the search space and provided a much faster and more accurate convergence than the classic GA formulation for a small and a medium benchmark network. It is expected that, for more complex networks, the advantages provided by the new B-GA approach could be even greater.

•
This new approach could also be implemented in other types of heuristic methods. The improvements on the performance of these heuristics provided by the new approach are still to be investigated.