Development of a Contaminant Distribution Model for Water Supply Systems

: Water contamination can result in serious health complications and gross socioeconomic implications. Therefore, identifying the source of contamination is of great concern to researchers and water operators, particularly, to avert the unfavorable consequences that can ensue from consuming contaminated water. As part of the effort to address this challenge, this present study proposes a novel contaminant distribution model for water supply systems. The concept of superimposing the contaminant over the hydraulic analysis was used to develop the proposed model. Four water sample networks were used to test the performance of the proposed model. The results obtained displayed the contaminant distributions across the water network at a limited computational time. Apart from being the ﬁrst in this domain, the signiﬁcant reduction of computational time achieved by the proposed model is a major contribution to the ﬁeld. from Nodes 7–13 range between the contaminant values from Nodes 1 and 46 (i.e., 0.040–0.050). The results showed that these nodes have contaminant mixtures from the stated nodes (i.e., Nodes 1 and 46). Similar attribute was observed at Nodes; 17, 18, 20, 23, and 24. The results of the remaining nodes indicated a mixture of contaminants from source Nodes 1 and 50. These can be seen from the numerical results for Nodes 47 to 67. Moreover, circumstances where three contaminants mix at the nodes are possible within the network. This further established the practicability of the proposed model as observed from the results.


Introduction
The provision of potable water is essential to human health and the well-being of a society. This also conforms with one of the set objectives (target 6a) of the United Nation's sustainable Development Goals (SDGs) by 2030 [1]. Despite reports that access to water have improved, a depleted delivery of potable water is an utmost concern that affects several continents. Interestingly, the goal of a utility operator is to supply water in adequate quantity and quality when desired. Usually, water is transported through water distribution networks (WDNs) from a treatment plant to consumers' taps. A WDN is a complex infrastructure, which comprises of: pipes, nodes, and reservoirs where human interference is possible. Hence, it is exposed to both accidental and intentional attacks, which can have severe consequences on the public health, besides socioeconomic implications [2,3].
Mostly, the quality of water is examined at a treatment plant, but it can technically be contaminated during transportation, through: pipe leakages, nodes and cross-connections [4]. The negative consequences of consuming contaminated water can be severe, as reported in the literature [5]. For instance, Kenzie et al. [6] and Corso et al. [7] discussed the significant impact of a transported infection through a water supply system in Milwaukee, (USA) that engendered 403,000 users, some were subsequently hospitalised with an estimated bill of about USD 96.2 million. Cooper et al. [8] reported the consequences of an accidental pollution of a chemical in a WDN in Virgina, where over 300,000 users were affected. Several studies [6][7][8] have also established that attacks on the WDNs are real, and can happen again. The socioeconomic implications that can arise from water network attacks

Background and Related Works
The CSI problem is characterised by responding to three (3) vital concerns: locating the source of contamination, the time of injection, and its magnitude. A derivation of the information from the data collected from water quality monitoring stations, is a complex task that must be resolved for prevention purposes, which include: public warning announcements, valves closure, pipes flushing, etc. Over the years, researchers have proposed different approaches; the use of simulation-optimisation approach [18], particle backtracking [28], machine learning [29], data mining [30], among others. CSI problem is commonly treated as an inverse problem by attempting to locate the source of contamination from the data collected from the water quality monitoring stations. Van Bloemen et al. [31] proposed a quadratic programming (QP) technique to tackle this problem. The authors' model exhibited a probable capability to address the challenge. However, excessive computational stress was identified as a shortcoming of their approach. Thereafter, Lair et al. [32] suggested a dynamic optimisation method, based on a sub-domain to curtail this computational stress. They reported that the model can merely handle the computational stress, but excludes key information during the selection of sub-domains, which was a setback of the approach.
Preis and Ostfeld [33] presented a coupled model tree-linear programming technique with the use of the commonly adopted EPANET tool developed by Rossman [34]. Preis and Ostfeld [35] proposed a combination of a Generic algorithm (GA) with the EPANET to resolve the CIS problem. EPANET was employed for the hydraulic simulation, while GA was used to moderate the injection attribute. The method minimises the difference between the measured and the evaluated data. The use of this method requires an excessive computation that demands the use of parallel computing. In addition, the study by Liu et al. [22,23] discussed an adaptive dynamic optimisation (ADOPT) method. This method has a real-time response with contamination occurrences. The authors utilised an evolutionary algorithm (EA) to overcome the early convergence that can lead to imprecise output. This process helps to preserve a group of possible output that can lead to diverse non-unique solutions. The EA enhances the results once a new observation is absorbed, which decreases the magnitude of the non-uniqueness. It was tested by using two water networks to demonstrate the feasibility of the proposed method. The consideration of a single source of injection and non-reactive contaminants are some of the advantages of this method; however, for a large water network, such assumptions may not be feasible. The model may therefore, suffer from over-generalization. Recently, Xu et al. [19] used a cultural algorithm to address a similar concern. They demonstrated the viability of the method by using three water supply networks. The results obtained exhibited positive capability of the technique, even though extreme computation was a major setback. Other shortcomings connected to the CSI problems are: complex network nodes, and stochaistic demand of water, which can lead to some problems of uncertainties. Yan et al. [36] proposed a hybrid encoding method to improve the convergence criteria.
Probabilistic and Bayesian techniques have also been used to address CSI problem. Dawsey et al. [37] proposed an inclusion of sensor data with a possibility to evaluate the source of occurrence from different areas, while Tao et al. [38] formulated a probabilistic approach. Nuepauer et al. [39] presented a backward modelling scheme with the use of a Probability Density Function (PDF) to locate the source and release time of contamination. The collected information from the monitoring sensor were used to generate the PDF. The results obtained expressed the effectiveness of the backward model to handle a steady flow conditions with a single source of the contamination. Wang and Zhou [40] applied a Bayesian sequential technique to handle the CSI problem. Baradouzi et al. [41] discussed a Probabilistic Support Vector Machines (PSVMs) technique to identify the source of contamination; they used some tools to train the PSVMs. The results obtained exhibit the feasibility of the approach to identify an upstream region viable for likely location. The water distribution system of Arak, Iran was employed to validate the proposed method.
Other notable works have been presented by Di Nardo et al. [42], model-based by Zechman and Ranjithan [20], artificial neutral networks, (ANN) by Kim et al. [43] and hybrid methods [44,45] to resolve the CSI problem. Propato [13] discussed an entropic-based method. In the study, a linear algebraic technique was first employed to minimise the selection of probable sources. Thereafter, a minimum entropy method was applied to evaluate the likely source, which led to the potential sources that are sensible to the monitoring stations. An evolutionary scheme and population-based global search method was presented by Zechman and Ranjithan [20]. The technique was devised by applying a tree-based encoding model, which generates the decision vectors and a set of connected genetic operators that led to an effective search. Liu et al. [45] combined a statistical method and a heuristic search model to present the contamination incident. The statistical method pointed the possible locations and the heuristic search method amplified the contamination source attributes. The feasibility of the method was examined by using two water networks. The results obtained showed a quick adaptive discovery of contaminant attributes. However, the method cannot be expanded to handle multiple sources occurrence. Liu et al. [44] proposed a hybrid method to characterise the sources by giving a sensor measurements in real time. The method integrates a logistic regression (LR) and local improvement model to speed-up the convergence processes. In order to ascertain the capability of the method, two water networks were examined. The first is a sample small network of about 117 pipes, while the details of the second network was adapted from [46]. The results obtained, expressed the fast convergence of the hybrid method when compared to the single method. Despite the significant effort recorded on the aforementioned problem, excessive computation remained unsolved. To this end, the present study formulates a contaminant distribution model when given the source of contamination, with an assumption that hydraulic solution is solved.

Proposed Model Formulation
Since hydraulic model exists and have been solved by some researchers [24,26], this study proposes to superimpose the contaminant into the hydraulic analysis. This study formulates a contaminant distribution model by giving a source of contaminations and, on an assumption that the hydraulic model is resolved and the network flow analysis is known.

Hydraulic Model
By applying graph the theory, a water distribution network can be presented as a connected graph with a set of edges and a set of nodes [24]. The former consists of pipes, pumps and valves. The two basic principles that describe the hydraulic equations in a water distribution network are: the principle of mass continuity in the node and, an energy conservation around the loop. A typical water network consist of n p number of pipes, n j number of junction nodes (nodes with unknown heads), and n f number of fixed-head nodes (nodes with known heads), the total number of nodes in the network can be expressed as: n t = n j + n f . The mass continuity equation is similar to the Kirchoff's law in electrical network and, it is applicable to the nodes with known demand. It states that the algebraic summation of the flows at the node is zero. Thus, it is expressed as: where q n p ×1 = [q 1 , q 2 , q np ] T is the vector of the pipe flow rates; d n j ×1 = [d 1 , d 2 , d nj ] T is the vector of the fixed demand at the nodes with unknown heads and A s is the node-pipe incidence matrix of dimension n j × n p connecting to the nodes with unknown heads [47]. The energy conservation law is related to the Kirchoff's voltage law in electrical network. It deals with head losses around the loops and, can be presented in term of its topological matrix as follows: where h is the head loss vector across the pipes; h s is the head loss across the loops;M m×np represented the loop-pipe incidence matrix; and m is the number of loops. The elementM in Equation (2) are derived from: if pipe j is in loop i and is in the same direction −1 if pipe j is in loop i and is in the opposite direction 0 if pipe j is not in loop i In addition, the energy conservation is expressed, thus: where h = [h 1 , h 2 , . . . h nj ] T denotes the vector of the unknown heads of dimension (n j × 1) and . . h f (n f ) ] T represents the vector of the unknown heads of dimension (n f × 1). A s is the node-pipe incidence matrix of dimension n j × n p associating to the nodes with unknown heads [47]. Both A s and A f are obtained from the actual topological incidence matrix, A as: The element A is derived from Similarly, the head loss equations are essential for the solution of the piping networks. It describes the pressure drop across a given pipe of flow in a particular pipe. An illustration of this is described by the element shown in Figure 1, with two end nodes i and j. The head loss due to the friction of the flow of water with the pipe wall is commonly expressed as: where h i and h j are the heads at end node of the pipe and r = [r 1 , . . . , r np ] T denotes the vector of the pipe resistance factor. Consideration of minor loss due to the valves and other pipe connections is generally expressed in the form: Subsequent, consideration of energy balance equation and the inclusion of the minor loss leads to; A matrix E can be defined as: By substituting Equations (12) into (11), the energy balance equation is expressed as Equations (1) and (13) are both steady-state hydraulic equations. These can be solved in order to estimate the pipe flow and the heads at the junction node. The system of equations expressed by Equation (14) are partly linear and partly non-linear [24].
Equation (14) can also be expressed as: The system of equations expressed in the Equation (14) maybe solved by an iterative method.
The matrix E is a n p × n p diagonal matrix, whose elements are formed from the head loss (including the minor loss due to valves) relation as: · · · · · · · · · r n p|q n p| α−1 + k mnp |q n p| where k m = [k m1 , k m2 , ..., k mnp ] T is a (n p × 1) vector of the minor loss factor due to valves or any other connections attached to the pipe; and α is an exponent whose value depends on the head loss model employed (1.85 for Hazen-Willam and 2 for both Darcy-Weisbach or Chezy-Manning head loss model) [48]. The variable r depends on the head loss model employed. In the circumstance where the Darcy-Weisbach or the Hazen-William's model is used, the hydraulic resistance for the l th pipe is described as: for Darcy-Weisbach model, and for Hazen-William model. In Equations (17) and (18), L l is the length of the l th pipe; g is the acceleration due to gravity; D l denotes the diameter of the l th pipe; f l is a dimensionless constant, which represents the fictional factor for the l th pipe; and Chw l is the Hazen-William friction coefficient for the l th pipe. The pipe friction factor f , in Equation (17) is a function of the equivalent sand roughness ε of pipes as well as the Reynold number R e , and this can be calculated by using the expression reported by Shockling et al. 2006 [49]. In this study, the Hazen-William head loss model will be adopted.

Formulation of the Proposed Contaminant Distribution Model
The formulation of this model assumes that the hydraulic solution has been resolved and the source of contaminations are known. Thus, the proposed contaminant distribution model is explicitly presented by using Figure 2. The simple network depicted in Figure 2 consists of Eight (8) nodes, and Ten (10) branches. In this case, nodes 1 and 3 are assumed to be the sources of contamination with variables; β 1 and β 3 while, the external sources are represented by;Q 1 andQ 3 respectively. The concentrations at nodes: 1 to 8 are independent of the inflow branches into a particular node. By assigning a variable δ k , to the concentration at node k, the following relationships are formulated in Equations (19)-(26) as: Considering the out of the node concentrations. If concentrations in branches are represented by α k , then Equations (26)-(33) are formulated as: Therefore, the concentration of contamination at node k is independent of the outflow branches from the respective nodes; this is expressed in Equation (34) C t out δ k = α k (34) where C t out in Equation (34) is expressed in Equation (35) and is the transpose of the incident matrices, δ k is the concentration at node k and α k is the concentration in branches.
Similarly, the inflow q k , into the nodes is formulated in Equations (36)-(43) as: The matrices formulation of Equations (36)-(43) is expressed in Equation (44) as: Equation (44) is generally expressed in Equation (45) as: where C t in in Equation (45) is the transpose of the inflow incident matrices and expressed in Equation (46), Q k is the flow andQ k , is the flow from the external sources.
The integration of the concentration at node, δ and flow at node, q can be expressed in term of flow, Q. Thus, the concentration in branches (i.e., pipes) α, are formulated and expressed in Equations (47)-(54) as: Equations (47) The general formulation of Equation (55) is expressed in Equation (56) as: The concentration in branches α, is related to the concentration at node δ and is expressed in Equation (57) as:  Equation (57) is generally expressed in Equation (58) as: If Equation (58) is substituted into Equation (56), then, Equation (59) is expressed as: By resolving Equation (60), δ may be derived. Therefore, the distribution of contaminants across the pipes and at the nodes can be quantified.

Application of the Developed Model on WDNs
The validation of the developed contaminant distribution model was implemented on four water distribution networks, which were adapted from literature [47,50,51]. All computations and hydraulic analysis were performed in MATLAB software environment.

Model Programming Procedures
This procedure assumes that the hydraulic network analysis has been resolved. In this study, Newton-Raphson's Content Model solution is employed [24]. The required input from the solved network analysis are; sending nodes, receiving nodes and the flow L/s. Thus, the program procedures are as follow: 1. Get the network analysis solution 2. Prepare the Structure 3. Get the pipe flows 4. Get supplies and demands 5. Get injections at the supply nodes 6. Get contamination at supply nodes 7. Compute the sum input flows to the nodes 8. Build matrices for equation as function of gamma, δ . 9. Compute contamination at the nodes 10. Get contamination in the pipes, α. Figure 3 is a water distribution network adapted from the work of Ozger [50], to demonstrate the validity of the developed contaminant distribution model. The network has two (2) reservoirs with twenty-one (21) Pipes, and (13) thirteen Nodes. In this example, it is assumed that 3% and 2% of the flow are injected at reservoir 1 and 2 as contaminants, respectively. The available network characteristics are presented in Table 1.

Illustrative Example 2
This example consists of 71 pipes, 46 nodes and a reservoir. The details of the Illustrative Example 2 for the validation of the developed model was adapted from Kumar et al. [51]. The schematic of Illustrative Example 2 is shown in Figure 4.

Illustrative Example 3
In this example, one hundred and five (105) pipes network depicted in Figure 4, was considered. The network consists of 105 pipes, three fixed-head nodes (sources), and 64 nodes, after redundant nodes (these are nodes where two or more pipes meet with zero demand) were removed. The network characteristic data defining the network is available in the report of Adedeji [47].

Illustrative Example 4
This example examined the four hundred and forty two pipes as represented in Figure 5. The network contains 442 pipes, three reservoirs, and 295 nodes after the redundant (these are nodes where two or more pipes meet with zero demand) nodes have been removed. The data defining the network is available in the report by Adedeji [47].

Results and Discussions
This section presents the results and discussions of the four (4) sample networks examined for the validation of the performance of the developed model. Tables 2 and 3 present the numerical results for illustrative Example 1.

Results and Discussions for Illustrative Example 1
The results of the pipe flow and the contaminant in the pipes are presented in Table 2. This Table shows that Pipe 7 has contaminant concentration of 0.029; it is as a result of the combination of the contaminants from Nodes 1 and 2 at Node 4. At Pipes 5, 6, 11, 12, and 21, the contaminants are from the same source, Node 2, while the other pipes have the same contaminant concentration (0.030); this implies that the contaminant is from Node 1. Based on these results, pipes with the same node as origin, have the same contaminant concentration, for example, Pipes, 5, 6, 11, 12, and 21. However, where different contaminant mix at a node, the contaminant concentration that will leave the node will be less than the maximum contaminant concentration that enters the node. A typical example of this scenario is at Pipe 7. On the other hand, when the quantity of contaminants from different pipes meeting at a node are the same, the contaminant concentration leaving the node will be the average of the contaminant concentration that entered into the node. Thus, this shows that the proposed model results are realistic.    The nodal flow and contaminant distribution at nodes for illustrative Example 1, is shown in Table 3. The results in this Table shows that Nodes 4 (0.029), 6 (0.024), and 8 (0.024) have different contaminant concentrations. These concentrations are different from the injected contaminant concentrations, which are 0.030 and 0.020 from Nodes 1 and 2, respectively. This difference in the contaminants' concentration is due to different contaminants mixing at a node. On the other hand, when different sources with the contaminant concentrations mix at a node, the node's contaminant concentration is the same as its source concentration.  Figure 4, has only one source of supply, and 5% of the flow is assumed as contaminant. Since, there is no contaminant mix, it is reasonable for the 0.050 contaminants injected at the source to flow through the entire network, which also verified the feasibility of the proposed model. The results of the pipes and nodes flow rate are also presented. A sub-network with area nodes: 15, 18, 30 and 27 revealed that the quantity of contaminants decreases from node 15 towards node 18. Similar attribute was observed from node 15 towards node 27. Perhaps, if there is a need for water piping extension, it will be appropriate to tap from the extreme nodes, irrespective of the node position. This is because, nodes at the extreme nodes have lower quantity of contaminants. It was generally observed that the farther the node from the source of supply, the lower the quantity of contaminant at the nodes junction.

Results and Discussions for Illustrative Example 3
Tables 6 and 7 present the numerical results of the contaminant contribution across the pipes and the nodes for illustrative Example 3. In this example, it was assumed that Nodes 1, 46, and 50 are sources of contamination with 0.04, 0.05 and 0.03 contaminants, respectively. This example has redundant nodes (these are nodes where two or more pipes meet with zero demand), which were excluded in the simulations. The node numerical results obtained (Table 7) revealed that most of the nodes have different contaminant values from injected contaminants at the three (3) sources (Nodes; 1, 46, and 50) of contamination. This is due to the fact that, different contaminants mix at different nodes of the networks. It was observed that the contaminant (0.04) at source (Node 1) flows through Node 1 to 6; 10-11; 14-16; 27 and 28, respectively. These nodes have direct connections to the source (Node 1) without mixing with the two sources of contaminant, as can be seen in the schematic diagram of the network in Figure 5. Similarly, the contaminant from source (Node 46) flows through Nodes; 46, 19, 21, and 22, respectively. These nodes are also directly connected to the source (Node 46) without any interconnected nodes from other sources of contamination. This same attribute was observed from source (Node 50) where Nodes; 41,36,34,35,33,41,53 and 55 have the same contaminant values that was injected at source (Node 50). On the other hand, contaminant at the remaining nodes differ from the injected values (ranges between the lowest to highest values. i.e., 0.030-0.050). Based on the results obtained, the values of contaminants from Nodes 7-13 range between the contaminant values from Nodes 1 and 46 (i.e., 0.040-0.050). The results showed that these nodes have contaminant mixtures from the stated nodes (i.e., Nodes 1 and 46). Similar attribute was observed at Nodes; 17, 18, 20, 23, and 24. The results of the remaining nodes indicated a mixture of contaminants from source Nodes 1 and 50. These can be seen from the numerical results for Nodes 47 to 67. Moreover, circumstances where three contaminants mix at the nodes are possible within the network. This further established the practicability of the proposed model as observed from the results.  The numerical results of pipe and contaminant flow for the Illustrative Example 3 is presented in Table 6. Similar to the node results, it was observed that pipes that are directly connected to the sources of contaminants have same flow as their sources. For instances, Pipes 1-4, 10-17, 23-29 are directly connected to source (Node 1). Similar scenarios are observed in Pipes; 36-42, which are connected to source (Node 46). Likewise, the Pipes connected to source (Node 50) are 52-55, 57-58, 72-77, 80, 82-85, and 105, respectively. The rest of the pipes are connected to the nodes where two or more contaminants mix as noted from the results obtained.

Results and Discussions for Illustrative Example 4
After the removal of redundant nodes (these are nodes where two or more pipes meet with zero demand), the illustrative Example 4 network consists, 442 pipes, three reservoirs, and 295 nodes as depicted in Figure 6. This network sample is bigger than Example 3 network, and they both have three sources of contaminations. In this case, nodes 1, 116, and 164 are assumed to be contamination sources with; 0.02, 0.01, and 0.03, respectively. The contaminant distributions across this network displayed the same attributes as the results of the Illustrative Example 3. The proposed method shows capability to handle medium networks within a short time. The results of Illustrative Example 4 is not presented due to its volume.  Table 8 shows the number of iterations taken in order to obtain the result for the various WDNs considered. As expected, the number of iterations increases as the size of the WDN increases.  Table 9 depicts the execution time for both hydraulic analysis and the proposed model for the various WDNs considered in this study. It was generally observed that the bigger the network, the higher the execution time.

Conclusions and Future Studies
Over the years, identification of contamination sources has received a significant attention among researchers and, has been a concern due to the negative effect that can emanate from the use of contaminated water. As part of an effort to fill this research gap, this study proposed a contaminant distribution model by superimposing the contaminant over the network analysis. To the best of the authors' knowledge, there is no record of this approach in the literature. The viability of the proposed model was tested with four water networks, and the model's performance was satisfactory. The results obtained described the practicability of the contaminant distribution across pipes and the nodes of the water networks. In addition, the results verified the practicability of the proposed model at a limited computational time. The source of contamination could be derived with this distribution model if, a set of measurement data is given. Thus, this will allow water supply companies to know the source of contamination upon which appropriate preventive measures such as; public awareness, closure of valves, etc. would be provided in order to minimise the extent of contamination on the society. In addition, comparison of this model with similar methodologies is important in order to ascertain its strength and weakness which, would be examined in future studies. Furthermore, procurement and maintenance cost of water quality monitoring sensor is also a challenge that must be addressed. Future research would focus on the issue of contamination source identification and optimal sensor placement in a water distribution network. The proposed model and its solution will be embedded within a method that allows the detection of the source of contamination. Although, this model was applied on a small and medium WDNs due to data availability, application of the proposed model on large networks would also be investigated in the future. These are topical areas of research interest that would be examined in subsequent studies. Finally, in order to increase the dynamics and robustness of the proposed model, it is expected that future research will explore the effects of external factors, such as temperature, on the contaminant distribution model.