Developing a Statistical Model to Improve Drinking Water Quality for Water Distribution System by Minimizing Heavy Metal Releases

This paper proposes a novel statistical approach for blending source waters in a public water distribution system to improve water quality (WQ) by minimizing the release of heavy metals (HMR). Normally, introducing a new source changes the original balanced environment and causes adverse effects on the WQ in a water distribution system. One harmful consequence of blending source water is the release of heavy metals, including lead, copper and iron. Most HMR studies focus on the forecasting of unfavorable effects using precise and complicated nonlinear equations. This paper uses a statistical multiple objectives optimization, namely Multiple Source Waters Blending Optimization (MSWBO), to find optimal blending ratios of source waters for minimizing three HMRs in a water supply system. In this paper, three response surface equations are applied to describe the reaction kinetics of HMR, and three dual response surface equations are used to track the standard deviations of the three response surface equations. A weighted sum method is performed for the multi-objective optimization problem to minimize three HMRs simultaneously. Finally, the experimental data of a pilot distribution system is used in the proposed statistical approach to demonstrate the model’s applicability, computational efficiency, and robustness.


Introduction
Mixing water from a new source may alter the original balanced environment in water networks.As a consequence, pipe corrosion, decreasing disinfectant residual and microbiological growth will deteriorate WQ.Optimizing the blending ratio of water coming from different sources may minimize this water deterioration [1][2][3][4].However, an optimal blending ratio usually cannot be realized because an operational blending ratio is dependent on the water demand, capacities of water plates, and locations of water plates and the mixing point of source waters.Furthermore, the concentrations of heavy metals are unassignable by water plates.They are impacted by the chemical parameter concentrations (such as chlorine concentration) and physical parameter levels (such as turbidity), also influenced by the original balanced environment in water networks.It is a challenge to minimize the release of heavy metals (HMR) in a water distribution system.
Imran et al. [1,5] described a method that optimize the blending of multiple source waters for heavy metal corrosion abatement and monochloramine residual control in distribution systems.They built up a pilot water distribution system to simulate a full-scale operational water distribution system.They fitted the nonlinear empirical WQ models for copper, lead, apparent color, and monochloramine dissipation.In addition, then based on the regulations, the optimal water blending ratio ranges were found.However, this optimization may need to be further developed, to provide some quantitative indicators to water utilities for improving the WQ in their water distribution systems.This paper proposes a new statistical optimization method for offering such quantitative indicators.
The optimization of this study is nonlinear because of the nonlinearity of the WQ.Conventional nonlinear optimization methodologies, such as gradient-based algorithms [6][7][8] and genetic algorithms [9], have some disadvantages that may not be suitable for use in the field of WQ.E.g., a gradient-based algorithm may have a large development cost and the linearization can be time-consuming; the genetic algorithms are time-consuming when be used in a large-scale nonlinear problem.Here, in this paper a second-order statistical model, namely the dual response surface optimization (DRSO), is introduced to conduct this nonlinear optimization.
The DRSO is based on the Response Surface Method (RSM).The RSM is a statistical technique used in empirical study.It approximates the true response surface and estimates the parameters; it works well for solving real response surface problems [10,11].As a consequence, it searches for an optimal set of input variables to optimize the response by using a set of designed experiments.In the past few years, several robust DRSO techniques have been developed [12,13].Yanikoglu et al. (2016) introduced Taguchi's Robust Parameter Design approach into DRSO to develop a method that uses only experimental data, and it can yield a solution that is robust against ambiguity in the probability of inputs [12].
The optimization is also required to run rapidly because it faces the dynamics of WQ, water demand, and source water capacity of water plates [14,15].RSM programming can be quickly solved by any commercial solver because it contains a quadratic model rather than a high-order nonlinear model.All observations in RSM are typically assumed to have equal variation that may not be practically valid [16].Vining and Myers [17] then proposed a DRSO model that includes two quadratic empirical models, one for the mean and another for the standard deviation, and then optimized one of the responses subjected to an appropriate constraint given by the other.Because the second empirical model considers the standard deviation, a DRSO approach can avoid misleading optimum results and produce robust results.
Harmful chemical reactions (especially for HMR) and microbiological growth are required to be controlled in water distribution systems.According to the previous studies, the biomass accumulation is influenced to a greater extent by the nature of the supporting material (such as unlined ductile iron) than by the secondary parameters of WQ [18].Meanwhile, Imran et al. [1,5] confirmed that loss of disinfectant residual is affected to a greater extent by the delivery distance and retention time rather than the WQ.This paper only focuses on the relationships between the HMR and the chemical and physical parameters of WQ in the distribution system.Three heavy metals are iron, lead, and copper.
This paper proposes a Multiple Source Waters blending Optimization (MSWBO) methodology to find optimal blending ratios for multiple source waters according to the WQ parameters of these source waters (shown in Table 1) and a variety of scenario designs.The objectives of the study are to reduce three HMR of lead, copper, and iron in water distribution systems, and minimize the operational cost.A weighed sum method is used for the multi-objective function in the MSWBO to minimize HMR simultaneously.The dual response surface models are used to describe the reaction kinetics of three HMR, with tracking the standard deviations of the responses of these three release equations.An application is provided to demonstrate the applicability and advantages of this MSWBO.Notes: Sourced from Imran et al. [1]; GW is groundwater, SW is surface water; DW is desalination water.HRT is hydraulic retention time.

Methodology
The development of the proposed MSWBO approach consists of three main steps: data acquisition, model fitting and optimization.Data acquisition includes data type identification, data range determination, experiment design, and experimental data recording.

Data Type Identification and Range Determination
Three HMR are fitted into the dual response surface models.For the dual response surface models, three mean responses are the release levels (RLs) of iron, lead, and copper in the mixed water; three standard deviation responses are the standard deviations of the responses of these three release equations; the input variables are the concentrations of the parameters of WQ in the source waters, respectively.These WQ parameters include alkalinity, calcium, dissolved oxygen, pH, silica, sodium chloride, sulfates, hydraulic retention time (HRT), and temperature.
For the MSWBO, the responses of the dual response surface models are treated as "the control variables"; the operational cost of the WQ adjustment of source waters is also treated as "the control variable"; the input variables of the dual response surface models are treated as "the state variables" or "the decision variables", which will be discussed in the next section; because the ratios of groundwater (GW), surface water (SW), and desalination water (DW) in the blending are given by the decision maker, they are treated as the state variables.
The value ranges of the variables related to WQ are limited by government standards such as National Primary Drinking Water Regulations (Primary Standards) and National Secondary Drinking Water Regulations (Secondary Standards) [19,20].The sum of three ratios is equal to 1.

Experiment Design and Experimental Data Recording
Tampa Bay Water (TBW) operates a water distribution system and was required by the Southwest Florida Water Management District to reduce groundwater withdrawal for minimizing hydrological and ecological impacts to the riverine systems and ensuring that flows remain within the range of natural variability [21].To complete this requirement, TBW completed the constructions of a surface water plant and a desalination plant [1,5].Meanwhile, a study [1] was sponsored by TBW and the American Water Works Association (AWWA) Research Foundation to evaluate the WQ impacts from the new source water supplies.As a part of the above-mentioned study, a pilot distribution system (PDS) was constructed in Florida, U.S., to simulate the full-scale operational water distribution system of TBW [1,5].Three source waters were selected in the PDS: a conventional groundwater (GW), surface water (SW) and desalinated water (DW).Table 1 describes their WQ information.The aged pipes of the PDS were removed from the water distribution system of TBW.The pipes were assembled on the pilot site and allowed to equilibrate with groundwater over a period of five months.After equilibrium was established, various blends were introduced into the PDS.Two years of continuous operation was performed on the PDS which was used to study the relationship between WQ and the distribution system.[22] first used the Response Surface Method to study the relationship between a response and a set of input variables.Vining and Myers [11] first fitted second-order polynomial models for a mean and standard deviation separately and optimized one of the responses subjected to an appropriate constraint given by the other.Since the second empirical model considers the standard deviation, a MSWBO approach can avoid misleading optimum and produce robust results.A general dual response surface model is formulated as follows:

Box and Wilson
where where h 0 , h, g 0 , g, H and G are the appropriate scalars, (k × 1) vectors, and (k × k) matrices for the estimated coefficients; C µ and C σ are the mean and standard deviation; S and S are (k × 1) vectors of the input variables and their transpose, respectively.The proposed study needs to fit three couples of second-order polynomial models: (1) the lead release model and its standard deviation model; (2) the copper release model and its standard deviation model; and (3) the iron release model and its standard deviation model.It should be noticed that these three couples of models use same secondary WQ parameters (alkalinity, sodium, pH, conductivity, etc.) as input variables.

Dual Response Surface Optimization
The next step is to optimize the above response surface models simultaneously.The objectives represent the desire for minimizing (1) metal RLs; (2) standard deviations of metal RLs; (3) adjustment costs of WQ for the source waters.With the constraints related to the above control variables, a MSWBO model is formulated as follows: where j is the index of the response of metal release, j = 1, 2, . . ., n; i is the index of the source water, i = 1, 2, . . ., m; L µ is the maximum permissible metal RL; S k is the concentration of the kth input variable in the blend, k is the index of input variables; x i is the percentage of the source water i in the blend, i is the index of source waters; U ki is the concentration of the kth input variable in the source water i; Ûki is the original concentration of the kth input variable in the source water i; U ki --and U ki are the lower bound and upper bound of U ki ; w 1 , w 2 and w 3 are the weights to reflect different priorities to the corrosion and the cost of WQ adjustment.U ki is the decision variable.It directly affects the WQ of the blend.Meanwhile, the lower the difference between U ki and Ûki , the low the operation cost of water utilities.In this study, C µ , C σ and (U ki − Ûki ) are included in the objective function to reflect the trade-off between operation cost and WQ.This can be realized by adjusting values of w 1 , w 2 and w 3 as shown in Equation (5a).

Nonlinear Empirical Model
The previous studies formulated the statistical nonlinear corrosion models for copper, lead and color (iron) [1,5,23,24].The nonlinear release models are listed as follows: Pb = 1.027 (T−25) (Alk) 0.677 (pH) −2.726 (Cl) 1.462 ∆C = 10 −1.321 (T) 0.813 (Alk) −0.912 (Cl) 0.485 (Na) 0.561 (SO 4 ) 0.118 (DO) 0.967 (HRT) 0.836 (8) where Cu and Pb are the copper and lead concentrations in mg/L, ∆C is the increase in apparent color (measured in cpu: Co-Pt unit), T is temperature in • C, Alk is the concentration of alkalinity in mg/L as calcium carbonated (CaCO 3 ); SO 4 and SiO 2 are the concentrations of sulfates and silica in mg/L, respectively, Cl is the concentration of chlorides in mg/L, pH is the dimensionless measure of the acidity or alkalinity of a solution, and (HRT) is the hydraulic retention time in days.Imran et al. (2006) concluded a strong relationship existed between the total iron (Fe) concentration (mg/L) and the apparent color (cpu) [1]: where R 2 is the correlation coefficient.The major advantage of the above nonlinear models is explicitly describing the relationships between the metal RLs and the concentrations of secondary WQ parameters, but these nonlinear models will cause the optimization to be time consuming.

Dual Response Surface Model
The dual response surface model is composed of two second-order polynomial models, one for a mean and another for the standard deviation.The mean model of a metal release describes the relationship between the metal RLs and the concentrations of WQ parameters.The standard deviation model is to establish the relationship between the standard deviation and the WQ parameters.If it only uses the mean model for optimization, the result may have a large variation.The second empirical model of metal release considers the standard deviation; thus, the optimization can optimize the metal release subjected to its standard deviation constraints.The dual response surface optimization can avoid generating such a large variation result and then can produce a robust result.
The WQ parameter of pH is non-conservative.It is defined as a negative decimal logarithm of the molar concentration of the hydrogen ion activity in a solution.Because the pH value of blends cannot be obtained from a mass balance equation, we chose the molar concentration of the activity of hydrogen ions A H+ as a substitution (A H+ = 10 −pH ).
The next step is to convert the above metal nonlinear models into the dual response surface models.The corresponding dual response surface models are formulated as follows: Iron-release response surface model (11) where C Fe µ is the mean response of iron release in mg/L, C Fe σ is the standard deviation response of iron release; T is the temperature in • C; Alk is the concentration of alkalinity in mg/L as calcium carbonate (CaCO 3 ); Na, SO 4 and Cl are the concentrations of sodium, sulfates and chlorides in mg/L, respectively; DO is the dissolved oxygen content in mg/L; and HRT is the hydraulic retention time in days.R 2 is the correlation coefficient; it should be noticed that the value of R 2 in Equation ( 10) is calculated by the quadratic equation responses against the results of Equation (8).
Copper-release response surface model 2 ) R 2 = 0.999 (13) where C Cu µ is the mean response of copper release in mg/L, C Cu σ is the standard deviation response of copper release; T is the temperature in • C; Alk is the concentration of alkalinity in mg/L as calcium carbonate (CaCO 3 ); SO 4 and SiO 2 are the concentrations of sulfates and silica in mg/L, respectively; A H+ is the molar concentration of the activity of hydrogen ions in 10 9 mol/L (in order to keep the same order of magnitude as the pH value).R 2 is the correlation coefficient.
Lead-release response surface model 2 ) R 2 = 0.998 (15) where C Pb µ is the mean response of lead release in mg/L, C Pb σ is the standard deviation response of lead release; T is the temperature in • C; Alk is the concentration of alkalinity in mg/L as calcium carbonate (CaCO 3 ); SO 4 and SiO 2 are the concentrations of sulfates and silica in mg/L, respectively; A H+ is the molar concentration of the activity of hydrogen ions in 10 9 mol/L (in order to keep the same order of magnitude as the pH value).R 2 is the correlation coefficient.

Maximum Permissible Metal Release
The Lead and Copper Rule of US Environmental Protection Agency (EPA) for copper stipulates 90% of samples have a copper concentration less than 1.3 mg/L, and the concentration for lead is less than 15 µg/L [19].The goal of this study is to find optimal blending ratios of source waters that can minimize the HMR in the distribution system.The minimized heavy metal concentrations should be less than the values of the regulations.Assuming these HMR followed a normal distribution; the maximum permissible metal RLs are their mean values.We know the standard deviations of the copper and lead RLs are 0.22 mg/L and 7 µg/L, respectively, and a one tailed Z statistic for 90% compliance is 1.28.The mean values of the copper and lead RLs are estimated on 1.02 mg/L and 6.04 µg/L, respectively.The actual government lead RL data indicated that the model ( 7) over-predicted the lead RL [1,5].The maximum permissible lead release is then modified on a value of 10 µg/L.
The modification has been experimented with in the governments' distribution systems and shows a better reflection on the distribution function of lead release [1,5].
The EPA guides the secondary maximum contaminate level of iron is 0.3 mg/L [20].We assume an influent iron RL of 0.1 mg/L, the maximum permissible iron RL within the distribution system is set at 0.2 mg/L.The metal release constraints are listed as: where L Cu , L Pb , and L Fe represent the maximum permissible concentrations of copper, lead, and iron in a water distribution system, respectively.

Variable Design
The task of this study is to find the optimal blending ratio of source waters, which consequently minimize all harmful reactions in the distribution system, as well as minimize the quality adjustment costs of source waters.The WQ chemical parameters of the source waters are considered as the decision variables.It is not necessary to use all of these chemical parameters as decision variables in the optimization model, such as some non-sensitive chemical parameters.Because these non-sensitive chemical parameters do not make much contribution to the metal release, e.g., the concentration of dissolved oxygen almost does not make contribution to the HMR, they are not necessary to be adjusted in the source waters.Hence, chemical parameters are considered as state inputs (certain value inputs).In practice, it is easy to increase the concentration of a chemical parameter in water rather than remove it from water, except for pH value.Therefore, the decision variables are designed to be varied from their initial value to a designed upper bound.
We found that only temperature, alkalinity and sulfates are common impacting parameters in the three metal release models (Equations ( 6)-( 8)).These three common impacting parameters are selected as decision variables.Due to the parameter of temperature as an uncontrollable parameter which is dependent on ambient temperature; it is set on 25 • C in this study.Because the initial concentrations of alkalinity in the GW and sulfates in SW are high, they are considered as the state inputs.
The pH is a major importance in determining the corrosivity of water.In general, the higher pH value, the lower HMR.Equations ( 6) and (7) show the effects of increasing pH for copper and lead control.In water treatment process, addition of calcium hydroxide Ca(OH) 2 is a technology for water pH increasing.So that, in real-time operation increasing pH value of water is easier than reducing it.In this study, pH is replaced by A H+ , namely the molar concentration of the activity of hydrogen ions, due to the A H+ value in a blending source waters can be calculated by mass balance.The SMCL's for pH is 6.5 to 8.5, and the initial pH values for the GW, SW and DW are 7.9, 8.2 and 8.3, respectively.So that, we design the pH of GW is a decision variable, and the value range is from 7.9 to 8.5.The pH of the SW and DW are state inputs because they are high initial values.
Equation ( 6) also demonstrates that increasing concentration of silica contributes to lower copper release.Silica is then designed as a decision variable in the SW and DW rather than in the GW because of a high initial concentration of silica in the GW.
A higher chloride and sodium concentrations may be expected because of membrane deterioration in the desalination plant.Equations ( 7) and (8) display that a higher concentration of the chloride can cause a higher release of lead and iron, and a higher concentration of the sodium leads to a higher iron release.Because the primary objective of the MSWBO is to minimize releases of the metals, the optimization model will inhibit any increase in chloride and sodium concentrations.Chloride and sodium have to be treated as the state inputs.For the same reason, the parameters of HRT and dissolved oxygen are also considered as the state inputs (independent variables) to the model.Table 2 lists the initial values of the state inputs and the ranges of the decision variables.

Results and Discussion
The dual response surface models are established for the releases of iron (Equations ( 10) and ( 11)), copper (Equations ( 12) and ( 13)) and lead (Equations ( 14) and ( 15)) in a particular water distribution system.On the basis of the ALs, the maximum copper and lead RLs are set at 1.0 mg/L and 10 µg/L, respectively.The maximum iron RL is established at 0.2 mg/L in accordance with the SMCLs.According to Model 5, the MSWBO approach for this particular water distribution system is developed when subjected to the dual response surface models and the relative constraints.The objectives are to minimize the metal corrosions of the pipes and minimize the adjustment costs of WQ for the multiple source waters.The decision variables of the MSWBO are selected from the WQ parameter, and the value ranges of these variables are listed in Table 2.
Three source waters are used to study the blending source waters problem.The WQ information of the three source waters are listed in Table 1.Theoretically, an optimal blending ratio may exist for a minimum release.A distribution system needs different blends of source waters, which depend on where the water demand is located on.It is not appropriate to set a constraint on source water percentages as long as there was no centralized point of blending.On the other hand, to study infinite blends of source waters is unrealistic.For brevity, only a few extreme situations and threshold blending combinations are discussed here.
The blending ratio design in the extreme situations considers the situation of single source water supply and one source off line; the threshold blending combinations are obtained from the conclusions of previous studies [1,5], such as GW contributions of >60% result in unacceptable copper release and <20% result in an unacceptable release in color.For the purpose of comparison, the release simulations for the base scenario (WQ information is shown in Table 1) are conducted though the three metal release models directly.The results are listed in Table 3.
In the objective function of the MSWBO, the term of the metal concentrations represents the WQ of blending source waters and the term of the changes of the variables represents the operation cost.The minimization of the objective reflects the trade-off between WQ and operation cost.This can be realized thought adjusting values of w 1 , w 2 and w 3 as shown in Equation (5a).Here, two weighting sets are designed as w 1 : w 2 : w 3 = 1: 1: 1 and w 1 : w 2 : w 3 = 100: 100: 1.
Examination of the MSWBO indicates that the unacceptable copper release, because of the effect of high GW percentage, can be addressed by increasing the pH value and/or silica concentration in the GW.Even for 100% of the GW, the copper RL can meet its maximum permissible RL though adjusting the pH value from 7.9 to 8.3 (Tables 4 and 5).The results of the MSWBO show that high pH value could help in controlling copper and lead release.The results of the base scenario simulations (Table 3) demonstrate that the distribution system does not have a heavy metal release problem when it receives the SW that has a percentage over the 90% in the blend waters, and it is only mixed with GW.If the SW were mixed with DW and the percentage of DW excessed 10% in the blend waters; then, high iron release will occur.The test of the MSWBO shows that the SW can satisfy the original balance environment of the pipe system in any percentage, by slightly increasing the concentration of silica and/or alkalinity.For instance, 100% of SW can be distributed to the pipe system with adjusting the concentration of silica from 7 mg/L to 8.7 mg/L and the concentration of alkalinity from 50 mg/L to 51.1 mg/L (Table 4).
High DW percentages (>90%) will result in unacceptable iron release, no matter how the WQ parameters are adjusted.However, this unacceptable iron release is much lower than that of the base scenario.For example, 100% of DW would cause 0.24 mg/L of iron release after optimally adjusting its alkalinity concentration (Table 4); for the base scenario, 100% of DW would result in 0.4 mg/L of iron RL (Table 3).
Examination of the MSWBO also verifies a previous observation: "corrosion of copper and lead pipes is increased by increasing alkalinity, whereas increasing alkalinity is beneficial in reducing the release of iron products from pipes; increasing sulfates reduces lead release but increases iron release" [1][2][3].These conflicting WQ requirements for release abatement can be addressed by the mathematical optimization technology of MSWBO.The trade-off between WQ and operation cost can be realized by adjusting the weighting values of w 1 , w 2 and w 3 .Tables 4 and 5 list the results of the MSWBO with different weighting sets of w 1 : w 2 : w 3 = 1: 1: 1 and w 1 : w 2 : w 3 = 100: 100: 1, respectively.The first weighting set shows that decision makers pay more attention to operation cost; the latter represents that decision makers are focused on WQ.
It is worth mentioning that the development of the MSWBO is based on two assumptions: (1) the pipes are of fixed geometry and the flow is in low conditions; (2) the changes of the WQ parameters (such as alkalinity and sulfates) have no effect (or have a very slight effect) on the pH value.The future studies could be: (a) develop a full-scale calibration of the model for the application of higher flow rates; (b) develop a rigorous approach to evaluating the pH of bends.
The MSWBO to the metal release problem assumes the means and variances of these metals are known.It is found that the optimal solutions are quite sensitive to these assumptions.We can use historical data to estimate the means and variances.However, the noises are rarely known and that would result in a non-optimal solution [25].These unknown noises need to be considered in a future study.

Conclusions
The Multiple Source Waters Blending Optimization (MSWBO) has been developed for blending source waters to improve WQ in a distribution system by minimizing the HMR.The developed MSWBO approach has favorable advantages over other approaches in the current literature; in particular it has the following properties: • It provides a quantitative optimal blending ratio of source waters to water utilities for minimizing the HMR in their water distribution systems.

•
It has a wide range of applications, because the experiment of this model was designed in a variety of scenarios, even for some extreme situations.

•
It shows a high computational efficiency, due to fact that the model only includes some second-order equations and inequalities, and also that the experiment showed that the run time is only several milliseconds.

•
It exhibits a robust operation, since the MSWBO model considers the standard deviation and avoids ambiguity in the probability of inputs.

Table 1 .
Water quality parameters of the source waters.

Table 2 .
Value range of water quality parameter.

Table 3 .
Results of release models.: x is the blending ratio of GW, y is the blending ratio of SW, z is the blending ratio of DW; Pb is in µg/L, others are in mg/L; Boldface number indicates the excess of maximum RL. Notes

Table 4 .
Results of MSWBO with w 1 = 1, w 2 = 1 and w 3 = 1.Notes: x is the blending ratio of groundwater (GW), y is the blending ratio of surface water (SW), and z is the blending ratio of desalination water (DW); Pb is in µg/L, others are in mg/L; Boldface number indicates the excess of maximum RL.
Notes: x is the blending ratio of groundwater (GW), y is the blending ratio of surface water (SW), and z is the blending ratio of desalination water (DW); Pb is in µg/L, others are in mg/L; Boldface number indicates the excess of maximum RL.