1. Introduction
Disinfection is a vital part of the drinking water treatment process to render water of potable quality and to minimise the occurrence of many waterborne diseases. Monochloramine (
), is the most widely used drinking water disinfectant after chlorine (
). Its usage in a water distribution system (WDS) substantially minimises the concentrations of regulated disinfection by-products (DBPs) in water [
1,
2]. Under typical water conditions, monochloramine is more stable than chlorine, hence, it is preferable in WDS where a high hydraulic retention time (HRT) is encountered. Drinking water utilities optimise the disinfection process to maximise the disinfectant stability. Still, monochloramine decays via several pathways such as an auto-decomposition, reaction with natural organic matter (NOM), reactions with nitrite (
) and bromide (
), microbiological reactions, and wall reactions [
1,
2,
3,
4,
5,
6]. Water quality modelling helps water utilities to achieve more technically sound and sustainable solutions for water quality management [
7]. In a process-based water quality modelling perspective, monochloramine decay is sometimes modelled as the sum of bulk decay and wall decay, assuming a zero or first-order decay kinetics [
8,
9,
10]. Bulk decay is caused by the disinfectant-demanding species present in bulk water while wall decay is a result of the disinfectant-demanding species present at the surface, including biofilm and corrosion products. Bulk decay usually dominates where the volume-to-surface ratio is high [
9,
11]. Moreover, sediments can accumulate in pipe and service reservoirs which can have a high disinfectant demand [
11]. Several factors including the pH, temperature, NOM concentration, and composition of surface materials can affect the bulk and wall decay processes [
6,
11,
12]. To prevent bacterial regrowth in a WDS, a certain level of monochloramine residual needs to be maintained throughout the WDS [
1,
8,
13].
A process-based water quality model requires both a hydraulic module and a water quality module. Such models are highly dependent on the reasonably accurate simulation of the underlying hydraulic conditions [
11]. This is because water quality parameters such as water age, reaction time, mixing conditions, and transport to and from the surface are governed by various hydraulic conditions. The hydraulic model is calibrated to ensure it reproduces the observed behaviour in the real system [
14,
15]. The calibration is done by formulating an optimisation problem and minimising it by adjusting several parameters such as pipe roughness, demand patterns, leakage parameters, and control rules for pumps, valves, and tanks [
14,
15]. Successful prediction of water quality using a well-calibrated hydraulic model depends on accurately defining the bulk and wall reaction characteristics. The bulk decay coefficient can be accurately determined from controlled laboratory experiments. However, wall decay is largely uncharacterised because of the complexities involved in its determination [
6,
8]. There are few studies that have reported the wall decay coefficient, which is very site- and material-specific. Moreover, the majority of these studies were conducted in either pilot distribution systems (PDS) or controlled laboratory experiments, hence, their application in a real distribution system is challenging. Doshi et al. [
16] proposed a field-based method to quantify the chlorinated wall decay coefficient by measuring the residual chlorine difference between two points in a pipe segment. This method can be applied to quantify the wall decay for monochloramine disinfectants. However, this method appears to have limited applicability, as in a real WDS, it is rare to find a pipe segment made up of the same material with the same diameter that can cause a noticeable monochloramine decay between the start and end of the segment. A different study used a water quality network model with a parameter optimisation technique to determine the chlorinated wall decay coefficient [
17]. This method is more feasible to quantify the monochloramine wall decay, however, little research is done in this area, which requires improvement.
Currently, many software packages such as EPANET (developed by the US Environmental Protection Agency), WaterGEMS, and WaterCAD (developed by Bentley systems) offer options to model hydraulic and water quality behaviour in a WDS. A multi-species water quality model (EPANET-MSX) was developed by Shang et al. [
12] that allows modelling multiple species as they grow or decay over time while transported through a WDS network. This model was applied by Alexander and Boccili [
18] in a real WDS and suggested that a multi-species model cannot accurately model the variation of several species throughout the network. It is more suitable to model the internal processes that made up the bulk decay or similar mechanisms. In contrast, modelling single species (e.g., decay of chlorine) using a bulk and wall reaction approach has proven to be successful in the case of a real WDS [
17]. The level of calibration achieved using many of these models depends on the optimisation algorithm that is being used. The available optimisation algorithms are broadly classified as local and global optimisation. A local optimisation algorithm terminates the optimisation process when it reaches a local minimum, whereas a global optimisation starts searching using multiple starting points to increase the likelihood to end at a global minimum [
14,
19]. Most of the global search algorithm belongs to the class of evolutionary algorithm [
20,
21]. The optimisation algorithm is incorporated with the water quality model to calibrate the model parameters.
Alternative methods such as data-driven models are gaining attraction for predicting a range of water quality parameters. The changes in water quality in a WDS are driven by many factors, including physical, environmental, chemical, and biological factors, simultaneously. Data-driven models should include variables for these factors that have an impact on the target parameters. The data-driven model uses artificial intelligence, or more specifically, machine learning (ML) algorithms, to build a relationship between predictor and response variables. Over the years, various non-linear models have been developed which include deep neural network (DNN), artificial neural network (ANN), K-nearest neighbour (KNN), naïve Bayes, partial least square regression (PLS), principal component regression (PCR), and support vector machines (SVM). These models have been successfully applied for the prediction and classification of water quality variables. For instance, Gibbs et al. [
22] used a linear regression model, multi-layer perceptron (MLP), and general regression neural network (GRNN) to predict chlorine concentration in a distribution system in South Australia, while Rodriguez et al. [
23] applied ANN to predict the same in a distribution system in the UK. Similarly, Aldhyani et al. [
24] used SVM, KNN, and naïve Bayes (NB) algorithms to classify the water quality index data and found SVM achieved the highest performance. Peters et al. [
25] used upstream water quality data to predict total chlorine residual (sum of monochloramine and free chlorine) downstream in a WDS. They found that the upstream pH, chlorine to ammonia ratio, and reservoir levels have little effect on the target variable while upstream total chlorine residual and temperature have a noticeable effect on downstream total chlorine.
The applications and subsequent algorithms used in ML are continuously evolving. Asadollah et al. [
26] introduced a new method called extra tree regression (ETR) to predict the monthly water quality index for a river. They compared its performance with support vector regression (SVR) and decision tree regression (DTR), where the new method showed better prediction performance. Singha et al. [
27] developed a deep learning (DL) model to predict groundwater quality. The model’s performances were compared with many other ML methods, including random forest (RF) and ANN, where the best prediction performance was achieved under DL. Similarly, Ahmed et al. [
28] investigated the water quality classification problem using decision tree (DT), multilayer perceptron (MLP), KNN, and NB. They analysed several parameters including the pH, dissolved oxygen (DO), electrical conductivity (EC), turbidity, and temperature, where a 99% classification accuracy was achieved under the DT algorithm. The ML algorithm was also used to analyse the water spectra to determine various water quality parameters including monochloramine disinfectant [
29]. These are a few applications of ML in different aspects of water quality modelling.
In light of the above discussion, this study attempted to apply the ML technique for disinfectant decay modelling in a WDS. There is little research completed in this area and further work is required. While these studies used upstream water quality data to predict the target water quality parameter downstream, they do not include hydraulic parameters which can have a significant relationship with monochloramine decay. Therefore, including them in the data analytics model might improve the model’s predictability. To the best of our knowledge, such an approach has not been proposed before, hence, it adds new knowledge to water quality modelling applications. Moreover, there are few studies that investigated the chloraminated wall decay coefficient using real operational data from WDS and this study aims to fill this knowledge gap. Therefore, the objectives of this study are to (i) quantify the wall decay parameter and develop a water quality network model to predict monochloramine residual concentration for a WDS (ii) develop a data analytics model to predict the same, and (iii) compare both models’ performance to better understand their decay behaviour. The outcomes of this study would be beneficial for water utilities to better manage disinfectant residuals and help identify areas for further improvement.
3. Results and Discussion
3.1. Typical Water Quality at the Studied Locations
The monochloramine concentrations were simulated at two separate monitoring points in the distribution system (i) at the PS2 inlet and (ii) at the Meningie pump station. During the study period, the monochloramine concentration at PS1 where chloramination is applied was 4.2 ± 0.1
. When these monochloramine residuals reached the PS2 inlet, their concentration was reduced to 2.5 ± 0.4
because of residual decay through bulk and wall decay processes. At the PS2 outlet, re-chlorination is done to boost the monochloramine concentration to 3.7 ± 0.2
. At Meningie, the residual concentration decreased to 2.3 ± 0.5
. The concentration of dichloramine and free chlorine at these locations was below the reporting limit for field titration (<0.1
). There was high variability in monochloramine concentration at the PS2 inlet and Meningie compared to its variability at the PS1 and PS2 outlet. A reduction in pH was also observed at Meningie (pH = 8.5 ± 0.3) compared to the PS1 (pH = 9.0 ± 0.1) and PS2 outlet (pH = 9.7 ± 0.1 after adjustment), which could possibly be caused by the products of monochloramine decomposition and/or wall reactions. The turbidity at PS1 was 0.07 ± 0.01 nephelometric turbidity unit (NTU), while DOC and
concentrations were 2.0 ± 0.2
and 0.2 ± 0.1
, respectively. As shown in
Figure 2, the monochloramine data from these locations suggest that bulk and wall decay processes cause a significant reduction in monochloramine residuals while traveling from Tailem Bend to Meningie.
3.2. Development and Calibration of Hydraulic Model
The details of the hydraulic model development and calibration process for the TBK system can be found in Hossain et al. [
14]. In brief, the model was constructed using various information such as the initial states of several hydraulic components, their geometry, and settings. For pipe head loss calculation, the Hazen–Williams formula was used, while the hydraulic time step was set to 30 min. Some missing values and outliers were found in the SCADA data which were replaced by the closest sensible values. Two different demand patterns were considered: residential and commercial. Several parameters including pipe roughness, pump settings, time-based controls for pump operation, and parameters representing variable demand patterns were calibrated. An automatic calibration method using the CMAES global optimisation algorithm was adopted. The calibration process was run in a Linux-based high-performance computing environment (HPC). Several codes were composed to run the whole calibration process in order.
Through the calibration process, the objective function was minimised to obtain the maximum fit between the observed and the simulated time series. The plot of observed vs. model simulated tank heads and flows corresponding to the best objective function at five different locations (i) flow at PS1, (ii) head at Coomandook tank, (iii) flow at PS2, (iv) head at Binnies tank, and (v) head at Meningie tank are presented in
Figure 3. The plot of the whole time series at these locations is given in
Figure A1,
Figure A2 and
Figure A3 in
Appendix A. The correlation between the observed and the simulated heads at Coomandook, Binnies, and Meningie tanks were 0.88, 0.95, and 0.88 while the correlation between the observed and the simulated flows at PS1 and PS2 were 0.85 and 0.80. The mean of the observed heads at Coomandook, Binnies, and Meningie were 81.23 m, 136.77 m, and 42.53 m while the same obtained through model simulation were 81.1 m, 136.76 m, and 42.54 m, respectively. In contrast, the mean of the observed and model-simulated flows at PS1 were 113.84
and 120.68
while those at PS2 were 57.19
and 55.92
, respectively. The percentage difference between observed and simulated flows at PS1 and PS2 were 5.83% and 2.25%, respectively. Therefore, the developed hydraulic model was considered adequately calibrated to reproduce the observed data.
3.3. Bulk Decay Study
The measured free chlorine and dichloramine concentrations in the five samples during the four-week observation period were below the reporting limit (<
). For all samples, the bulk decay coefficients of monochloramine were determined by fitting a first-order decay curve. The average value of these coefficients was considered as the bulk decay coefficient in calibrating the water quality model.
Figure 4 shows the observed monochloramine decay and the subsequent first-order curve fitting where the slope of the line represents the bulk decay coefficient. The estimated value of the average bulk decay coefficient for these samples was −0.0012
. The bulk decay coefficient can be changed due to changes in pH and temperature. Therefore, considering it as the starting point, this value was also calibrated, representing the average bulk decay coefficient during the study period.
3.4. Water Quality Model Calibration and Validation
Observed data from the selected locations were used to calibrate the EPANET water quality model. The parameters representing bulk and wall decays, tank reaction rates, and initial monochloramine residual concentrations were included in the calibration process. The initial monochloramine concentration can be gradually reduced as the distance from the source increases because of residual decay. Therefore, the whole distribution system is arbitrarily divided into several zones and for each zone, individual initial concentrations were assigned which were also calibrated. The calibrated value of the first-order bulk decay coefficient for the flow path from PS1 to the PS2 inlet (path 1) was −0.0012
while for the PS2 outlet to Meningie (path 2) was −0.001
. For path 2, the calibrated bulk decay coefficient was relatively low as compared to that for path 1. This is because the monochloramine bulk decay rate decreased with the increased water age. Similarly, this variation can also be attributed to varying pH and temperature throughout the system. In contrast, the calibrated first-order wall decay coefficient for path 1 was −0.029 m
while path 2 was −0.006 m
. For path 1, most pipe diameters varied from 610 mm to 758 mm while for path 2, most pipe diameters ranged between 102 mm to 363 mm. The calibrated wall decay coefficients represent average values for the whole path. The fit between the observed and the simulated monochloramine residual concentration as obtained through the calibrated model was measured by the Root Mean Square Error (RMSE) and coefficient of determination (R
2).
Figure 5 shows the observed and the simulated monochloramine concentration plots at the PS2 inlet and Meningie and corresponding correlation plots for these locations. At the PS2 inlet, the calibration performance in terms of RMSE and R
2 were 0.007 and 0.66, respectively, while at Meningie, the performances were 0.009 and 0.94. The model’s performance at the PS2 inlet was relatively poor as compared to that at Meningie, which can be attributed to many factors including pH and temperature changes and the quality of the observed data used in model calibration.
For the water quality model validation, another hydraulic model was constructed in EPANET using a different data period. The hydraulic model was adequately calibrated for one week. Then, the water quality model was run with the previously obtained calibrated water quality parameters. The time-series plots of the observed vs. the simulated monochloramine concentrations at the PS2 inlet and Meningie during the validation period are shown in
Figure 6. The correlation between the observed and model-simulated values at the PS2 inlet were 0.64 and 0.74 at Meningie. During the validation period, the mean of the observed monochloramine concentrations at PS2 inlet and Meningie were 3.2 ± 0.1
and 2.1 ± 0.1
, respectively, while the same simulated values for these locations were 3.2 ± 0.1
and 2.1 ± 0.1
, respectively. These statistics indicate that the calibrated water quality model can reasonably reproduce the observed monochloramine concentrations during the validation period.
3.5. Machine Learning Model
To improve the monochloramine prediction at these locations, machine learning models were developed using the SVR algorithm. The SVR model used observed water quality data from the source point or point of chloramine application including monochloramine concentration, flow, pH, turbidity, DOC,
, and water age at the points of interest as predictor variables, while the observed monochloramine concentration at the points of interest were used as response variables. An epsilon value of 0.01 and 10-fold cross-validation were employed. A grid search method was used to find the optimum SVR function parameters. Modelling performances under different kernel functions such as linear, polynomial, RBF, and sigmoid were investigated (
Table 1), and the RBF was found to be the most accurate kernel to map the data.
Figure 7 shows the plot of observed vs. model predicted time-series at the PS2 inlet and Meningie and corresponding correlation plots using the RBF kernel. At the PS2 inlet, RMSE and R
2 in model training were 0.03 and 0.99 while in cross-validation, they were 0.11 and 0.92, respectively. Similarly, at Meningie, RMSE and R
2 during SVR model training were 0.03 and 0.99, while the same during cross-validation were 0.05 and 0.99. These statistics suggest that the developed SVR model can adequately predict the monochloramine concentrations at the studied locations.
3.6. Discussion and Future Work
This study suggests that both the water quality network model and data analytics model can adequately predict the observed monochloramine concentrations at the studied locations. However, the data analytics model shows a relatively higher performance as compared to the water quality network model. The water quality network model’s performance can be improved by accurately defining the initial water quality conditions throughout the network. It should be noted that the water quality network model is a more practical option for most WDS as it requires only a few water quality data, such as the decay rate and initial concentrations, to simulate the monochloramine profile throughout the network. In contrast, the data analytics model requires several water quality data, hence, it becomes impractical for WDS that do not employ real-time monitoring for a range of water quality parameters.
Monochloramine decay is a complex process that depends on several chemical and microbiological factors. The bulk decay rate determined in the laboratory mainly consists of monochloramine auto-decomposition, decay due to NOM, and microbiological reactions. It is evident that for the same initial monochloramine concentration, the individual decay component can vary with the changed water quality including pH, temperature, and the concentration of NOM and microbiological cells, hence, the overall bulk decay rate also varies. Moreover, the variability of flow rate and the mixing condition in the tank can alter the water chemistry and can affect the overall monochloramine decay rate. Therefore, the bulk decay parameter was also calibrated by assuming a little tolerance above and below the value obtained from the lab experiment. The calibrated parameters are expected to minimise the temperature effect on the bulk and wall decay rates, and they represent average values for the whole path.
Literature suggests that monochloramine is more stable at a higher pH. During the course of the bulk decay experiment, pH was found to decrease, which could be due to monochloramine decomposition reactions. In the case of real WDS, the products of bulk and wall reactions may also contribute to reducing the pH of water which can significantly accelerate the monochloramine decay rate. However, at the studied WDS, pH was re-adjusted at an intermediate point at PS2 to maximise the monochloramine stability. Hence, the monochloramine decay rate was assumed to be approximately stable. For WDS where a significant reduction in pH occurs between the source point and the point of interest, one might encounter a poor calibration performance because of the variable decay rate. In such a case, using a relatively shorter period of data for model calibration may improve the calibration performance.
Pipe materials can significantly affect the chloramine decay, hence, for a specific WDS, the wall decay coefficient can vary in different parts of the distribution system. In a review, Hossain et al. [
8] suggested that cement pipes can affect chloramine decay as the principal ingredients of cement are aluminosilicates which can be reactive to chloramine. They can also leach lime over time which may change the pH of the water and affect the chloramine stability. Similarly, pipes made of polyvinyl chloride (PVC) can leach organic compounds, including plasticisers, which affect chloramine decay. Westbrook and Digiano [
6] reported the chloraminated wall decay coefficient of 0.67 m d
−1 in cast iron (CI) pipe and 0.026 m d
−1 in ductile iron (DI) pipe. Similarly, Liu [
41] determined the chloraminated wall decay coefficient of 0.046 m d
−1 in many areas of the distribution where CI pipes dominate, whereas it was reduced to 0.0160 m d
−1 where the pipe material was mostly made of PVC. In our study, the estimated wall decay coefficient varied from 0.15 m d
−1 to 0.69 m d
−1 dependent on pipe category, and the dominant pipe category of the distribution system was asbestos cement (AC). Other pipe categories such as PVC and DI are also found in some areas of the distribution system. Hence, the quantified wall decay coefficient through parameter optimisation is consistent with the previous studies.
Wall decay is very site-specific and depends on several factors including pipe diameter, pipe material, and age [
8,
42]. The estimated wall decay coefficient in the current study reflects the decay associated with the dominant pipe category. For better accuracy, the flow path should be divided into several zones and for each zone, an individual wall decay coefficient should be assigned to minimise the error caused by different pipe materials or age. Future studies should consider this aspect.
In a distribution system, a significant amount of monochloramine decay happens in service reservoirs [
43]. The mixing condition largely characterises the monochloramine decay dynamics in the reservoir. This study assumes complete and instantaneous mixing which can be changed depending on several factors including the flow rate, seasonal variation of water usage, and customer demand. Further research is necessary to better understand the mixing pattern and the subsequent decay characteristics in the reservoir. Possibly, other available mixing models such as two-compartment mixing, FIFO, and LIFO plug flow mixing can be explored. In addition, the current studies used the CMAES global search algorithm to calibrate the water quality network model parameters; calibration performance under other global optimisations should be an area to explore further.
A data-driven model can be used without the development of a hydraulic model for a WDS [
22,
25]. However, the current study developed a data analytics model using some variables that were obtained through hydraulic model simulation. Using significant parameters helps reduce the model size by removing redundant information and decreasing the noise introduced in the model [
22]. In a chlorinated system, Gibbs et al. [
22] identified the significant parameters for an ANN model were upstream chlorine concentration, flow, and temperature. Similarly, using a data-driven approach, Peters et al. [
25] found that reservoir total chlorine and temperature can capture 90% of the variability in the downstream total chlorine, while the turbidity, pH, and chlorine to ammonia ratio have a relatively smaller effect. The significant parameters found in this study are upstream monochloramine concentrations, pH, DOC, and water age, while flow, turbidity, and
have relatively less effect on the downstream monochloramine concentration. At the PS2 inlet, these significant parameters upstream can capture 93% of the variability in the downstream monochloramine concentration during model training and 88% variability during cross-validation.
While the current study used the SVR method to develop the data analytics model, the modelling performance can be different under different ML algorithms. Further studies may investigate the data analytics modelling performance using different ML algorithms. The performance can also be improved by including more water quality variables. Due to data unavailability, the current study does not include the temperature variable in the data analytics model. As temperature is one of the crucial factors controlling monochloramine decay, incorporating it into the model may better explain the relationship between the variables and improve the model’s predictability. In addition, microbiological activity can significantly affect the monochloramine decay process. Hence, further research is necessary to identify and incorporate these important microbiological parameters.
The calibrated water quality network model can be better used to predict the monochloramine concentration for a different period if the water quality is relatively stable and similar to the calibration period. If there is a significant change in water quality, its chemistry may change which can alter the bulk and wall decay kinetics. In such a case, the model needs to be re-calibrated using the observed data to obtain a better prediction. The procedure/method adopted in this study has shown good chloramine predictability, hence, it can be applied to a different distribution system. The developed model is a site-specific model which is valid only for the case-studied system. For a different distribution system, the hydraulic and water quality characteristics may be different, hence the need to re-develop and/or re-calibrate the hydraulic, water quality, and data analytics model for that system.
4. Conclusions
Management of disinfectant residuals in a distribution system is important to ensure water is microbiologically safe to consume. From the entry point where disinfectant is applied to the customer taps, the residual monochloramine gradually decays during its passage through the network. From a water quality modelling viewpoint, monochloramine decay can be modelled using bulk and wall decay kinetics. The bulk decay coefficient for a specific distribution system can be quantified in the laboratory through controlled laboratory experiments, while wall decay is difficult to quantify as it depends on several factors including pipe materials, pipe age, and the density of biofilms and corrosion products. In this paper, we have quantified the wall decay coefficient using the parameter optimisation technique. For this purpose, a hydraulic model of the studied distribution system was constructed and adequately calibrated to reproduce the observed data at the selected locations. Then, the water quality model was formulated by assigning bulk and wall decays. The bulk decay, as obtained through lab experiments, was assigned to the model while the initial value of the wall decay coefficient and range were obtained from the literature. The CMAES, which is a global search algorithm, was used to optimise the water quality model parameters. The whole calibration process was run in a Linux-based HPC environment using parallel processing. Several types of software were incorporated together to run the whole calibration process in order.
Using the calibrated wall decay parameters, the water quality model-simulated monochloramine residual concentrations were compared with the observed data at two different locations in the Tailem Bend drinking water distribution system located in South Australia. For the first location (PS1), the goodness-of-fit between the observed and simulated monochloramine residual concentration was , while for the second location, the fit was . The calibrated water quality network model was validated using a different data period and found to reasonably reproduce the observed data.
To improve the model prediction at these locations, data analytics models were developed using machine learning algorithms. Various hydraulic and water quality data such as flow rate, water age, pH, DOC, and monochloramine concentrations at the source point and the point of interest were used to build the data analytics model. Using the SVR algorithm and a 10-fold cross-validation, the model’s performance was compared under different kernel functions. The RBF was found to be the most appropriate kernel to map the data. The performance at PS2 in the SVR model training was and cross-validation was . In contrast, at Meningie, the SVR performance in both model training and cross-validation was . At both locations, the SVR models reproduced the observed data with a high level of accuracy. The key findings are (i) the estimated wall decay coefficient for the TBK system varies from 0.15 m d−1 to 0.69 m d−1, depending on the pipe diameter, (ii) the significant parameters in the data analytics model are upstream monochloramine concentration, pH, water age and DOC concentrations, (iii) the RBF is the most appropriate kernel function to analyse data using the SVR method, and (vi) the upstream flow, turbidity and concentrations have little effect on downstream monochloramine. Finally, it can be concluded that, at the studied locations, the water quality network model and data analytics model both adequately predict the observed data. However, the data analytics model showed better predictability, and hence, is recommended if various water quality data are available.