Spectrophotometric Online Detection of Drinking Water Disinfectant: A Machine Learning Approach

The spectra fingerprint of drinking water from a water treatment plant (WTP) is characterised by a number of light-absorbing substances, including organic, nitrate, disinfectant, and particle or turbidity. Detection of disinfectant (monochloramine) can be better achieved by separating its spectra from the combined spectra. In this paper, two major focuses are (i) the separation of monochloramine spectra from the combined spectra and (ii) assessment of the application of the machine learning algorithm in real-time detection of monochloramine. The support vector regression (SVR) model was developed using multi-wavelength ultraviolet-visible (UV-Vis) absorbance spectra and online amperometric monochloramine residual measurement data. The performance of the SVR model was evaluated by using four different kernel functions. Results show that (i) particles or turbidity in water have a significant effect on UV-Vis spectral measurement and improved modelling accuracy is achieved by using particle compensated spectra; (ii) modelling performance is further improved by compensating the spectra for natural organic matter (NOM) and nitrate (NO3) and (iii) the choice of kernel functions greatly affected the SVR performance, especially the radial basis function (RBF) appears to be the highest performing kernel function. The outcomes of this research suggest that disinfectant residual (monochloramine) can be measured in real time using the SVR algorithm with a precision level of ± 0.1 mg L−1.


Introduction
Conventional drinking water treatment processes consist of several stages to ensure treated water is safe for human consumption. In many countries, the final stage of treatment is the addition of a disinfectant to inactivate microorganisms in the water and to guard against recontamination and prevent the growth of biofilms [1]. Typically, chlorine and chloramines are the most widely used drinking water disinfectants [2,3]. In regional areas where disinfected water must travel to customers several hundred kilometres away, chloramines are ideal due to their greater stability compared with chlorine [1,2]. Chloramines have three different chemical forms: monochloramine (NH 2 Cl), dichloramine (NHCl 2 ), and trichloramine (NCl 3 ) [2][3][4]. Dichloramine and trichloramine have not been proven to be a suitable disinfectant because they are less stable than monochloramine and are reported absorbance in the same wavelength range, so subtracting their absorbing contribution from the spectra will improve the measurement accuracy of monochloramine. Additionally, due to light scattering by suspended particles, turbidity in water causes a non-linear lifting of the spectrum, thereby reducing the measurement accuracy [27]. To minimise this effect, various particle compensation techniques, such as multiplicative scatter correction (MSC), theoretical model, and chemical and machine learning methods, have been developed [27][28][29]. In this study, a combination of particle, organic, and nitrate compensation was assessed.
The online spectrophotometric method of monochloramine detection is comparatively new, with little research completed in this area. Previous studies focused on applying the standard chemometric method using particle compensated spectra to relate spectral features with the monochloramine concentration. This study attempted to isolate the monochloramine spectra first by applying an additional spectral compensation for organic and nitrate. Hence, the objectives of the study were: (i) The development of spectral compensation to isolate the monochloramine spectra, and (ii) linking of the isolated spectra to amperometric monochloramine residual data using the machine learning algorithm.
To date, to the authors' best knowledge, no such method for online spectrophotometric measurement of monochloramine residual has been developed. This research showed that the regular field monitoring data of organic and nitrate levels could be an alternative to compensate the UV-Vis spectra for online detection of monochloramine. Improved modelling accuracy using such spectral compensation is the focus in this paper. The schematic of the proposed method is shown in Figure 1.
Sensors 2020, 20, x FOR PEER REVIEW 3 of 30 thereby reducing the measurement accuracy [27]. To minimise this effect, various particle compensation techniques, such as multiplicative scatter correction (MSC), theoretical model, and chemical and machine learning methods, have been developed [27][28][29]. In this study, a combination of particle, organic, and nitrate compensation was assessed.
The online spectrophotometric method of monochloramine detection is comparatively new, with little research completed in this area. Previous studies focused on applying the standard chemometric method using particle compensated spectra to relate spectral features with the monochloramine concentration. This study attempted to isolate the monochloramine spectra first by applying an additional spectral compensation for organic and nitrate. Hence, the objectives of the study were: (i) The development of spectral compensation to isolate the monochloramine spectra, and (ii) linking of the isolated spectra to amperometric monochloramine residual data using the machine learning algorithm.
To date, to the authors' best knowledge, no such method for online spectrophotometric measurement of monochloramine residual has been developed. This research showed that the regular field monitoring data of organic and nitrate levels could be an alternative to compensate the UV-Vis spectra for online detection of monochloramine. Improved modelling accuracy using such spectral compensation is the focus in this paper. The schematic of the proposed method is shown in Figure 1. The remainder of this paper is organised as follows: Section 2 includes a description of the study area and relevant literature in this domain and methodology adopted in the research. Section 3 presents the results of the study. In Section 4, the machine learning modelling performances using different spectral compensation with different kernel functions are compared. Some limitations of the research are also discussed there and provide future lines of work. Section 5 concludes the paper.

Study Area
The Tailem Bend drinking water distribution system is one of the major drinking water distribution systems operating in regional South Australia (Figure 2a). It is located at Tailem Bend township in South Australia, which is approximately 80 km southeast of Adelaide. The WTP collects water from River Murray and operates a conventional treatment process (coagulation → flocculation → sedimentation → filtration), with disinfection by UV irradiation and chloramination (Figure 2b). The treated water is then pumped into its distribution network, consisting of about a 143-km-long pipeline and several hundred kilometres of branch mains. Water quality at the WTP and multiple locations of the distribution network is monitored using various online-based devices. The remainder of this paper is organised as follows: Section 2 includes a description of the study area and relevant literature in this domain and methodology adopted in the research. Section 3 presents the results of the study. In Section 4, the machine learning modelling performances using different spectral compensation with different kernel functions are compared. Some limitations of the research are also discussed there and provide future lines of work. Section 5 concludes the paper.

Study Area
The Tailem Bend drinking water distribution system is one of the major drinking water distribution systems operating in regional South Australia (Figure 2a). It is located at Tailem Bend township in South Australia, which is approximately 80 km southeast of Adelaide. The WTP collects water from River Murray and operates a conventional treatment process (coagulation → flocculation → sedimentation → filtration), with disinfection by UV irradiation and chloramination (Figure 2b). The treated water is then pumped into its distribution network, consisting of about a 143-km-long The treated water has a varying level of turbidity, ranging between 0.04 and 0.12 nephelometric turbidity units (NTU) with a mean value of 0.08 NTU and standard deviation of 0.02 NTU. The upper and lower range of the monochloramine concentration during the study period was 5.5 and 3.0 mg L -1 with a mean value of 4.3 mg L -1 and standard deviation of 0.2 mg L -1 . Similarly, pH ranged between 7.8 and 9.3 with a mean value of 8.7 and dissolved organic carbon (DOC) ranged between 1.5 and 2.6 mg L -1 with mean value of 2 mg L -1 . The standard deviations of the pH and DOC values were 0.3 and 0.3 mg L -1 , respectively. The treated water has a varying level of turbidity, ranging between 0.04 and 0.12 nephelometric turbidity units (NTU) with a mean value of 0.08 NTU and standard deviation of 0.02 NTU. The upper and lower range of the monochloramine concentration during the study period was 5.5 and 3.0 mg L −1 Sensors 2020, 20, 6671 5 of 29 with a mean value of 4.3 mg L −1 and standard deviation of 0.2 mg L −1 . Similarly, pH ranged between 7.8 and 9.3 with a mean value of 8.7 and dissolved organic carbon (DOC) ranged between 1.5 and 2.6 mg L −1 with mean value of 2 mg L −1 . The standard deviations of the pH and DOC values were 0.3 and 0.3 mg L −1 , respectively.
At the WTP, an amperometric online chlorine analyser (Depolox 5, Wallace & Tiernan, Evoqua, Pittsburgh, PA, USA) is used to monitor the monochloramine residual, which is located after the disinfection and fluoride addition process (Figure 2b). The installed UV-Vis spectrophotometer and the online chlorine analyser are located close to each other to minimise the discrepancies in the hydraulic residence time (HRT) difference and the samples.

UV-Vis Spectrophotometric Device
The instrument used in this study was an online spectrophotometer probe from s::can Messtechnik GmbH, Austria that works on the principle of UV-Vis spectrometry. The more significant advantage of using spectrophotometric detection is that unlike many other online analysers, it does not require any chemical reagent to operate. The main component of the device consists of a stainless-steel body housing the UV-Vis spectrophotometer, which can be used either directly by placing it into the water sample or by attaching a sampling cell to the probe's light path. Spectral data is obtained by a double beam of 256-pixel UV-Vis xenon light, which passes through the sample, with the absorbance value measured within wavelengths ranging from 200 to 750 nm with a 100-mm pathlength. In this study, the absorbance spectrum or fingerprint was measured every two minutes, with data stored in a computer connected to the probe.
At the WTP, the sampling cell of the spectrophotometer was fed from two different sample points: one was treated water prior to chloramination, which is termed as pre-chloraminated water; and the other was treated water after disinfection, which is termed as post-chloraminated water. Switching between the sources was controlled using an electronically controlled valve, and the duration of each source feeding to the sampling cell was set to 10 min.

Particle Interference on UV-Vis Spectrum and Compensation
Turbidity due to particles in water including silt, clay, organic and inorganic matter, and microscopic organisms may obstruct the transmittance of light, causing it to scatter, and thereby adding interference to the whole spectrum. The target compounds exist as dissolved species, so removal of particle absorbance is necessary to reduce interference. The standard procedure of measuring UV-Vis absorbance is to filter the sample through a 0.45-µm filter, so that filter retains the majority of these particles. Consequently, the corresponding spectrum is free from particle interference. For online spectrophotometric detection, physical filtering cannot be done easily as it is a slow process and cannot consistently deliver the required flow to the device. Therefore, the unfiltered spectrum obtained results in light scattering that need to be corrected to get the absorbance by dissolved compounds in the water matrix. This process is known as turbidity or particle compensation. Several particle compensation techniques exist in the literature [27][28][29][30]. The software equipped with the spectrophotometer has a built-in function to do this operation for different water types (i.e., drinking water, wastewater, etc.).

Support Vector Regression
Support vector machines (SVMs) are a popular machine learning algorithm introduced by Vapnik and other researchers [31][32][33]. The concept originated from statistical learning theory for solving a constrained quadratic problem where the convex objective function for optimisation is represented by a combination of loss function and a regularisation term [34]. The two most common applications of SVMs are support vector classification (SVC) and support vector regression (SVR). For the classification problems, the objective is to find the optimal separating hyperplane that maximises the margin of the training data. A hyperplane can be defined as a boundary that separates various data classes. In an ndimensional Euclidean space, the hyperplane is a subset of that space with dimension n-1 that divides Sensors 2020, 20, 6671 6 of 29 the space into two disconnected parts. Data points that are closer to the hyperplane are called support vectors (SVs). Figure 3a shows an optimal hyperplane with a maximising margin between two classes of data. Although the concept of the SVC algorithm was originally based on binary classification, it can be extended to multi-class classification problems by combining a series of binary classifiers [35].
Sensors 2020, 20, x FOR PEER REVIEW 6 of 30 originally based on binary classification, it can be extended to multi-class classification problems by combining a series of binary classifiers [35]. However, in many cases, data may not be linearly separable. Under such a condition, SVC uses a kernel trick (with the details discussed afterwards) to map the data in a high-dimensional space where linear separation is possible [34,[36][37][38]. There are theorems that guarantee the existence of such kernel functions under certain conditions [32,33,39]. This is shown in Figure 3b, where two classes of data, A and B, are linearly inseparable in the two-dimensional input space, but after transforming to three-dimensional feature space, the separation becomes possible.
In SVM for a regression problem, the objective is to fit a model to predict a quantity for the future. Thus, the data points are expected to be distributed closely around the regression line except that an epsilon (ε) range is defined from both sides of the hyperplane where the regression function is considered to be insensitive (Figure 3c). Errors smaller than ε do not matter while if they are greater than ε, they are of concern. The theory behind the SVR algorithm can be found in many studies [31][32][33][34]36,38,39]. Based on this literature, a description is provided below.
Consider a dataset {( 1 , 1 ), ( 2 , 2 ), … … , ( , )} ⊂ × , SVR tries to fit a function ( ) for all the training data that has at most ε deviation from the actually obtained targets and at the same time keeping it as flat as possible (Figure 3c). In the case of linear functions, the equation can be written as: However, in many cases, data may not be linearly separable. Under such a condition, SVC uses a kernel trick (with the details discussed afterwards) to map the data in a high-dimensional space where linear separation is possible [34,[36][37][38]. There are theorems that guarantee the existence of such kernel functions under certain conditions [32,33,39]. This is shown in Figure 3b, where two classes of data, A and B, are linearly inseparable in the two-dimensional input space, but after transforming to three-dimensional feature space, the separation becomes possible.
In SVM for a regression problem, the objective is to fit a model to predict a quantity for the future. Thus, the data points are expected to be distributed closely around the regression line except that an epsilon (ε) range is defined from both sides of the hyperplane where the regression function is considered to be insensitive (Figure 3c). Errors smaller than ε do not matter while if they are greater than ε, they are of concern. The theory behind the SVR algorithm can be found in many studies [31][32][33][34]36,38,39]. Based on this literature, a description is provided below.
Consider a dataset (x 1 , y 1 ), (x 2 , y 2 ), . . . . . . , (x l , y l ) ⊂ X × R, SVR tries to fit a function f (x) for all the training data that has at most ε deviation from the actually obtained targets y i and at the same time keeping it as flat as possible (Figure 3c). In the case of linear functions, the equation can be written as: where the symbol ., . in Equation (1) represents the dot product in X and the parameter ω represents a dimensional weight vector that controls the flatness of the function. The smaller the value of ω, the flatter the function. This can be achieved by minimising the Euclidean norm ω 2 . Therefore, it can be considered as a convex optimisation problem by requiring: The solution of Equation (2) is feasible in cases where the function ( f ) actually exists and approximates all pairs (x i , y i ) with ε precision. Otherwise, slack variables ξ i , ξ * i are introduced to allow some error to cope with infeasible constraints of the optimisation problem. Thus, the optimisation equation can be rewritten as: The constant C > 0 indicates the trade-off between the flatness of the function ( f ) and the amount of maximum deviations permitted over ε. The optimisation problem in Equation (3) can be solved by constructing a dual problem, where the aim is to maximise the objective function in terms of the dual variables under the derived constraints on the dual variables. The first step is to construct a Lagrange function by adding the constraints to the objective function: The dual variables in Equation (4) need to fulfil the conditions α i , α * i , η i , η * i ≥ 0. It follows from the saddle point definition that the partial derivatives of L in terms of the primal variables ω, b, ξ i , ξ * i have to vanish to reach the optimal condition: Sensors 2020, 20, 6671 8 of 29 Substituting Equations (5) to (7) into Equation (4) yields the following dual optimisation problem: The dual variables η i η * i do not appear in Equation (8) because through Equation (7) they have been eliminated. Equation (6) can be rewritten as: Therefore: This is the support vector (SV) expansion for the function ( f ). Equation (10) indicates that the term ω in Equation (1) can be represented by a linear combination of the training patterns x i , while b can be computed by applying the Karush-Kuhn-Tucker conditions [40], which states that the product between dual variables and constraints has to vanish at the optimal solution. This means: and: Based on these conditions, some conclusions can be made. Firstly, the ε-insensitive tube around the function ( f ) does not include samples (x i , y i ) with corresponding α * i = C. Secondly, a set of dual variables α i , α * i , both simultaneously nonzero, does not exist as it requires non-zero slacks in both directions; therefore, α i α * i = 0. Finally, for α * i ∈ (0, C) results in ξ * i = 0, and the second factor in Equation (11) has to vanish. Thus, b can be computed as: The SVR algorithm can be extended to non-linear functions through mapping of the data (X) to another space, called feature space (F), by applying a transformation function φ : X → F and then using the standard SVR algorithm [34,39]. Thus, for a non-linear case, the optimisation problem becomes about finding the flattest function in the feature space instead of the input space.
The kernel function can be defined as a linear dot product in the feature space. It can be shown that for certain mappings φ, kernel functions k [32,33,39]. The functions k x i , x j have to satisfy Mercer's condition [34,39]. Since solving the dual problem in the SV algorithm depends on the values of the dot product, a kernel function can be used instead of φ. Therefore, the algorithm can be rewritten as: Sensors 2020, 20, 6671 9 of 29 The function ( f ) can now be expressed as: The SVR modelling accuracy can be improved with the right choice of kernel function as different kernel functions have different mapping capabilities. The four kernel functions given in Equations (16) to (19) are most commonly used in the SVR algorithm [22,[35][36][37][38][39]41]: where x T i is the transpose of x i , r is a constant term, d is the polynomial order, and Y is a RBF kernel parameter that controls the spread of the data while transforming to higher dimensions.

Methodology
The systematic procedure adopted in this research is presented in Figure 4. The SVR modelling accuracy can be improved with the right choice of kernel function as different kernel functions have different mapping capabilities. The four kernel functions given in Equations (16) to (19) are most commonly used in the SVR algorithm [22,[35][36][37][38][39]41]: where is the transpose of , r is a constant term, is the polynomial order, and is a RBF kernel parameter that controls the spread of the data while transforming to higher dimensions.

Methodology
The systematic procedure adopted in this research is presented in Figure 4. Light absorbance data from a spectrophotometer and monochloramine concentration data from an amperometric analyser were collected during the period from December 2018 to March 2019 and processed for further analysis. The validation of the amperometric analyser data is given in Appendix A, Figure A1. Missing values in the data were estimated by linear interpolation. Outliers Light absorbance data from a spectrophotometer and monochloramine concentration data from an amperometric analyser were collected during the period from December 2018 to March 2019 and processed for further analysis. The validation of the amperometric analyser data is given in Appendix A, Figure A1. Missing values in the data were estimated by linear interpolation. Outliers were checked using the modified z-score method [42][43][44] and removed from the data. This method is expressed by the following equation: where M i is the modified z-score, and x i and x are the ith ordinate and median of a feature vector, respectively. The median absolute deviation (MAD) is given by: The modified z-score method is more robust than the standard z-score method. This is because while calculating the standard z-score, the arithmetic mean and standard deviations are used. Therefore, the computed z-score can be significantly affected by a few extreme values or by even a single extreme value. This does not happen in the case of the modified z-score as it uses the median value instead of the mean. According to many researchers, including Iglewicz and Hoaglin [44], a modified z-score greater than 3.5 can be considered an outlier.
To properly align a spectral signal with monochloramine data, the MATLAB (The MathWorks Inc., Natick, MA, USA) interactive file brushing tool was used. Firstly, the correlation of the absorbance at various wavelengths to the monochloramine concentration was assessed. The Pearson correlation was calculated using the following formula: where R i , P i are the ith data points from the spectra and monochloramine concentration, respectively, and n is the total number of data points.
The wavelength corresponding to the maximum correlation was considered as the representative wavelength, which was at 245 nm. Spectral analysis of the monochloramine solution at different levels using a benchtop laboratory spectrophotometer also indicated a peak at 245 nm. This wavelength and monochloramine concentration data were plotted in a single graph in MATLAB using appropriate scale settings. A portion of the whole time series is shown in Figure 5, where some gaps were identified due to plant shutdown. For each segment, cross-correlation was considered to determine the alignment appropriateness. These methods of alignment served two purposes: (i) identifying if there is any hydraulic residence time (HRT) between the data sources, and (ii) if there is any clock time difference between them. The alignment corresponding to the maximum cross-correlation was considered as the appropriate alignment. The whole time series was aligned segment by segment.
It was found that the historic monochloramine data from the amperometric analyser contained a considerable number of numerical values that were repeated several times in the data. This is due to the different protocol settings of the data historian software. To overcome this issue, a random sampling from monochloramine data was done in such way that each numerical value could not appear more than one time. This ensured a unique model training while the employing machine learning algorithm. R codes were utilised to perform distinct random sampling several million times. For each random sampling, the goodness-of-fit between the monochloramine data and spectral time-series was assessed and the numerical seed that provided the maximum match was considered as the appropriate seed in random sampling. The resulting data were used in the model. It was found that the historic monochloramine data from the amperometric analyser contained a considerable number of numerical values that were repeated several times in the data. This is due to the different protocol settings of the data historian software. To overcome this issue, a random sampling from monochloramine data was done in such way that each numerical value could not appear more than one time. This ensured a unique model training while the employing machine learning algorithm. R codes were utilised to perform distinct random sampling several million times. For each random sampling, the goodness-of-fit between the monochloramine data and spectral timeseries was assessed and the numerical seed that provided the maximum match was considered as the appropriate seed in random sampling. The resulting data were used in the model.
Particle compensation is a vital component to be considered while analysing any light-absorbing spectral data. In this study, particle compensation was completed by using the offline spectral data processing tool that accompanies the spectrophotometer. It offers particle compensation for various water types, such as drinking water, wastewater, river water, etc., where the drinking water category was selected for the compensation.
NOM is the dominant light-absorbing component in water and can interfere with monochloramine spectra [19,26]. Additionally, the presence of nitrate (expressed as NO3-N) may absorb UV light [26]. A compensation for organic and nitrate was applied here to separate monochloramine spectra from the recorded post-chloramination spectra. The objective was to determine whether separating the monochloramine spectra and training the SVR model using them improved the accuracy. The detailed procedure of separating the monochloramine spectra is shown in Figure 6.
As shown in Figure 6, the spectra fingerprints of both pre-chloraminated and postchloraminated water at the WTP were obtained by a single spectrophotometer probe that gives measurement of a range of water quality parameters. The spectrophotometric module was calibrated to match the DOC and NO3-N measurements with lab-measured values. Figure 7 shows the calibration of DOC and NO3-N parameters, where trends indicate a good level of agreement between these data.
It was assumed that spectral configuration of pre-chloraminated water was mainly governed by DOC and NO3-N. Therefore, using these parameters, a polynomial regression model was developed for each absorbing wavelength by utilising R programming codes. A fourth-order polynomial function was used to model the spectra. During the chloramination process at the WTP, oxidation reactions may occur while mixing ammonia and chlorine to water, potentially causing DOC and NO3-N concentrations to change. These were assumed to have a minor effect because the source water Particle compensation is a vital component to be considered while analysing any light-absorbing spectral data. In this study, particle compensation was completed by using the offline spectral data processing tool that accompanies the spectrophotometer. It offers particle compensation for various water types, such as drinking water, wastewater, river water, etc., where the drinking water category was selected for the compensation.
NOM is the dominant light-absorbing component in water and can interfere with monochloramine spectra [19,26]. Additionally, the presence of nitrate (expressed as NO 3 -N) may absorb UV light [26]. A compensation for organic and nitrate was applied here to separate monochloramine spectra from the recorded post-chloramination spectra. The objective was to determine whether separating the monochloramine spectra and training the SVR model using them improved the accuracy. The detailed procedure of separating the monochloramine spectra is shown in Figure 6.
As shown in Figure 6, the spectra fingerprints of both pre-chloraminated and post-chloraminated water at the WTP were obtained by a single spectrophotometer probe that gives measurement of a range of water quality parameters. The spectrophotometric module was calibrated to match the DOC and NO 3 -N measurements with lab-measured values. Figure 7 shows the calibration of DOC and NO 3 -N parameters, where trends indicate a good level of agreement between these data.
It was assumed that spectral configuration of pre-chloraminated water was mainly governed by DOC and NO 3 -N. Therefore, using these parameters, a polynomial regression model was developed for each absorbing wavelength by utilising R programming codes. A fourth-order polynomial function was used to model the spectra. During the chloramination process at the WTP, oxidation reactions may occur while mixing ammonia and chlorine to water, potentially causing DOC and NO 3 -N concentrations to change. These were assumed to have a minor effect because the source water location to the spectrophotometer was immediately after chloramination. Therefore, it was assumed that the DOC and NO 3 -N concentrations in the post-chloraminated water created similar spectra as pre-chloraminated water spectra while passing through the regression model. Direct subtraction was not considered to be accurate because both pre-chloraminated and post-chloraminated water was monitored using a single spectrophotometer, so they had different timestamps while spectral measurements were taken.
Machine learning modelling accuracy can be impacted by multi-collinearity problems if a high correlation exists between feature variables [45]. It has been found that light absorbance for a specific wavelength is highly correlated to the neighbouring wavelengths, and correlation gradually decreases to far wavelengths. Therefore, to avoid redundancy in the model training, principal components were extracted and used in modelling rather than using absorbance values directly. Moreover, the use of principle components can ensure maximum performance of the machine learning algorithm as the data size is significantly reduced by principle component analysis (PCA). It has been found that factors producing eigenvalues greater than 0.01 can explain 99.9% of the variance of the data, and therefore these factors were considered as feature variables to build up the model.  The SVR method in Unscrumbler X (CAMO software, Oslo, Norway) was used to build up the model and its performance was evaluated under four different kernel functions: (i) linear, (ii) polynomial, (iii) RBF, and (iv) sigmoid. Among the SVR parameters, ε controls the width of the hyperplane. A comparatively larger value of ε indicates fewer support vectors are selected in the modelling, resulting in more flat estimates by the model. According to Mattera and Haykin [46], an ε value that causes the number of support vectors to be approximately 50% of the data length can be considered a good choice. In this study, ε was selected as 0.01, causing approximately 50% of the support vectors of the data.
C is an SVC learner parameter and it represents the penalty of misclassifying a data point. Comparatively smaller C values indicate some misclassification of data will be encountered by the classifier. In contrast, a more substantial value of C represents the classifier will be heavily penalised for misclassified data points. Apart from C, the parameter also need to be optimised. A low value of indicates a very broad decision region whereas a high value creates islands of decision boundaries around data points. The value of can be estimated as = 1 2 2 , where σ represents the Gaussian noise level of the standard deviation [34]. Both C and values were obtained by using the The SVR method in Unscrumbler X (CAMO software, Oslo, Norway) was used to build up the model and its performance was evaluated under four different kernel functions: (i) linear, (ii) polynomial, (iii) RBF, and (iv) sigmoid. Among the SVR parameters, ε controls the width of the hyperplane. A comparatively larger value of ε indicates fewer support vectors are selected in the modelling, resulting in more flat estimates by the model. According to Mattera and Haykin [46], an ε value that causes the number of support vectors to be approximately 50% of the data length can be considered a good choice. In this study, ε was selected as 0.01, causing approximately 50% of the support vectors of the data.
C is an SVC learner parameter and it represents the penalty of misclassifying a data point. Comparatively smaller C values indicate some misclassification of data will be encountered by the classifier. In contrast, a more substantial value of C represents the classifier will be heavily penalised for misclassified data points. Apart from C, the parameter Y also need to be optimised. A low value of Y indicates a very broad decision region whereas a high value creates islands of decision boundaries around data points. The value of Y can be estimated as Y = 1 2σ 2 , where σ represents the Gaussian noise level of the standard deviation [34]. Both C and Y values were obtained by using the built-in grid search method in Unscrumbler X while a third-degree polynomial function was used in modelling with the polynomial kernel. Among the various methods used in model validation in machine learning, hold-out validation and k-fold cross-validation are widely used. In the first case, data is required to split into a training set and a testing set. However, dividing the original data can cause information loss, thereby increasing the error induced by bias. Therefore, to minimise the error, a 10-fold cross-validation procedure was adopted. As a general rule, a 5-fold or 10-fold cross-validation has been empirically shown to ensure that the error estimate suffers neither high bias nor high variance. Goodness-of-fit between the reference and model predicted was evaluated by the coefficient of determination (R-square), and root mean square error (RMSE) [36,38,47] as given by the following formula: where R i and P i are the reference and predicted values of the monochloramine concentration, respectively, n is the total number of data points, and R is the mean of the reference values. The value of R-square ranges from 0 to 1, where 1 means perfect fit and 0 means no fit at all. RMSE has no scale, but a best-fitted model will encounter a low RMSE value. Data normalisation is an integral part of machine learning. To minimise bias, all feature variables in the data were normalised before SVR analysis. The purpose is to bring their values to a common scale so that the model training becomes less sensitive to the scale of features as regularization behaves differently for different scaling. Properly scaled feature variables can ensure convergence of the SVR algorithm. In this study, data were scaled to −1 to +1 by using the following formula: where x is the normalised data, x i is the ith ordinate of a feature vector, µ is the mean value, and "max" represents the maximum value.

Monochloramine Peak Absorbance Wavelength Detection and Particle Compensation
The spectra fingerprint of monochloramine, hence the peak absorbance wavelength, was determined in ultrapure water from a Milli-Q water purification system (Millipore, Molsheim, France) using a benchtop laboratory spectrophotometer. The resulting spectra between 210 and 330 nm corresponding to various monochloramine levels are presented in Figure 8a, indicating that absorbance increases as concentration increases and peak absorbance appears at about 245 nm for all concentrations. The remaining portion of the spectra is comparatively flat. For all monochloramine solutions, pH was kept constant, at approximately 8.5, to avoid spectral shifting and to match with the operational pH practised at the Tailem Bend WTP. In Figure 8a, it is seen that the starting absorbance of some spectra with a low concentration of monochloramine is comparatively higher than the spectra with a high concentration of monochloramine. This is due to a relatively high amount of dichloramine formation during the preparation of the monochloramine solution. Unlike Milli-Q water, treated water at the WTP contains several light-absorbing substances with peaks at different wavelengths. Therefore, derivative spectra were derived from online data from a spectrophotometer to identify the location where the major peak appears. Figure 8b shows the first derivative of spectra within the 225-300 nm region, which indicates a sudden slope change marked by the red circle in the figure. This is caused by monochloramine spectra with a peak at about 245 nm. It can be seen in Figure 8b, a minor peak in the derivative spectra appears between 260 and 280 nm. According to Roccaro et al. [48], the derivative spectra at 272 nm may relate to chlorinated disinfection by-products and precursors. The remaining region of the derivative spectra is comparatively flat.
Furthermore, a Pearson's correlation analysis was performed between the spectral absorbance at various wavelengths and monochloramine data from the amperometric analyser. The results indicated significant correlation between the two data sets at the 0.01 level and maximum correlation occurred at about 245 nm with a correlation coefficient value of 0.54. Therefore, this wavelength was used to align both data sets. Figure 8c shows the uncompensated spectra obtained from the UV-Vis spectrophotometer while Figure 8d shows the corresponding particle compensated spectra obtained by processing uncompensated spectra using the spectrophotometer's built-in particle compensation tool. For better viewing, wavelengths between 220 and 330 nm are displayed in the figure while the full wavelength spectra are provided in Appendix A, Figure A2. It is evident from these figures that uncompensated spectra show a relatively higher absorbance than particle compensated spectra as the light absorbance by the particle is removed through particle compensation. The difference between the two is the compensation due to the light-scattering effect. The accuracy of particle compensated spectra was verified by accessing the correlation of the absorbance at various wavelengths to Unlike Milli-Q water, treated water at the WTP contains several light-absorbing substances with peaks at different wavelengths. Therefore, derivative spectra were derived from online data from a spectrophotometer to identify the location where the major peak appears. Figure 8b shows the first derivative of spectra within the 225-300 nm region, which indicates a sudden slope change marked by the red circle in the figure. This is caused by monochloramine spectra with a peak at about 245 nm. It can be seen in Figure 8b, a minor peak in the derivative spectra appears between 260 and 280 nm. According to Roccaro et al. [48], the derivative spectra at 272 nm may relate to chlorinated disinfection by-products and precursors. The remaining region of the derivative spectra is comparatively flat.
Furthermore, a Pearson's correlation analysis was performed between the spectral absorbance at various wavelengths and monochloramine data from the amperometric analyser. The results indicated significant correlation between the two data sets at the 0.01 level and maximum correlation occurred at about 245 nm with a correlation coefficient value of 0.54. Therefore, this wavelength was used to align both data sets. Figure 8c shows the uncompensated spectra obtained from the UV-Vis spectrophotometer while Figure 8d shows the corresponding particle compensated spectra obtained by processing uncompensated spectra using the spectrophotometer's built-in particle compensation tool. For better viewing, wavelengths between 220 and 330 nm are displayed in the figure while the full wavelength spectra are provided in Appendix A, Figure A2. It is evident from these figures that uncompensated spectra show a relatively higher absorbance than particle compensated spectra as the light absorbance by the particle is removed through particle compensation. The difference between the two is the compensation due to the light-scattering effect. The accuracy of particle compensated spectra was verified by accessing the correlation of the absorbance at various wavelengths to monochloramine data from the amperometric analyser. The analysis indicated that after correcting the spectra through particle compensation, the correlation coefficient improved from 0.54 to 0.62.

Spectral Compensation for Organic and Nitrate
A comparison of the typical pre-chloraminated and post-chloraminated water spectrum recorded at the WTP with particle compensation within the 220-330 nm range is given in Figure 9a while the full UV-Vis range of wavelengths is available in Appendix A, Figures A2 and A3. In the figure, it is evident that after adding monochloramine, the light absorbance by water is increased in between wavelengths of 220 and 280 nm. The remaining region of the spectra is overlaid. Figure 9b shows the post-chloramination spectra and estimated pre-chloramination spectra in the same plot, which clearly indicates the absorbance by the estimated spectra is comparatively lower within wavelengths from 220 to 280 nm. Moreover, the remaining regions of the two spectra were overlaid, closely resembling Figure 9a. Figure 9c shows the accuracy of the polynomial model for each spectral wavelength measured in terms of the coefficient of determination. The R-square values indicate that the DOC and NO 3 -N correlations to spectral wavelengths in between 220 and 400 nm are maximised with low variability, while for the rest of the wavelengths, the correlation is irregular. Therefore, the wavelengths within 220 to 400 nm mainly contribute to estimating the spectral configuration of pre-chloraminated water.
The RMSE values in Figure 9d indicate that the starting RMSE is comparatively higher and gradually decreases to far wavelengths. This is due to the relatively high absorbance value at the starting wavelength as the molar absorptivity increases with a decreasing wavelength, thereby encountering a comparatively high residual error in the model fitting. From 400 nm and greater wavelengths, the RMSE values are close to zero because the absorbance in this region is very low. Hence, the residuals are very low in the model fitting as compared to spectra in the 220-400 nm region. The numeric data for Figure 9c,d are available in Appendix B, Table A1. As can be seen in Figure 9a, after the addition of monochloramine disinfectant, spectral changes occurred between wavelengths of 220 and 280 nm. The polynomial regression model performance in terms of R-square in that region using DOC and NO 3 -N data varies from 0.92 to 0.99 (Table A1, Appendix B). The R-square value close to 1 indicates that organic and nitrate are the major species in the spectrum while other species (if any) have a minor effect in the spectral configuration in that range. So, this method is well suited for typical drinking water. The polynomial regression model performance can be further improved by adding other water quality parameters (if available) as predicting variables. Overall, the method is the same except the number of predictor variables is increased to obtain a better fit.
The DOC and NO 3 -N compensated spectra are presented in Figure 9e, which are identical to the typical monochloramine spectra presented in Figure 8a. Wavelengths of only 220 to 330 nm are shown in the figure as the remaining region of the spectra is comparatively flat (full-wavelength spectra are given in Appendix A, Figure A4). The peak absorbance appeared at about the 245-nm wavelength, which is characteristic of a typical monochloramine spectrum. Some portion of the spectra starting from the 280-nm wavelength shows negative absorbance, which is subjected to estimation error by the polynomial model and corresponding arithmetic subtraction. A baseline correction was applied using a linear offset method while developing the SVR model with these spectral data. The polynomial regression model performance can be further improved by adding other water quality parameters (if available) as predicting variables. Overall, the method is the same except the number of predictor variables is increased to obtain a better fit.
The DOC and NO3-N compensated spectra are presented in Figure 9e, which are identical to the typical monochloramine spectra presented in Figure 8a. Wavelengths of only 220 to 330 nm are shown in the figure as the remaining region of the spectra is comparatively flat (full-wavelength spectra are given in Appendix A, Figure A4). The peak absorbance appeared at about the 245-nm wavelength, which is characteristic of a typical monochloramine spectrum. Some portion of the spectra starting from the 280-nm wavelength shows negative absorbance, which is subjected to

SVR Model Fitting
Using both particle compensated and uncompensated spectra, the SVR model was developed. The ε value was set to 0.01, which means data points that fall within this margin will be considered insensitive. The model training accuracy of the uncompensated spectra is presented in Figure 10a while the particle compensated spectra are presented in Figure 10b. The term "reference" in the x-axis in the figure means the observed monochloramine concentration data from the amperometric analyser. It can be seen in Figure 10a that for uncompensated spectra, the best agreement between both data sets was achieved by using the RBF kernel with an R-square value of 0.915 and RMSE of 0.102. In contrast, the other kernels do not indicate a reasonable level of agreement between the reference and predicted values. Figure 10b shows a good level of model training performance in the particle compensated spectra using the polynomial and RBF kernel, with R-square values of 0.999 and 0.957, respectively. The RMSE values with the polynomial and the RBF kernel are 0.010 and 0.074, respectively, indicating a deficient error in the model training. For the linear and sigmoid kernels, data points in the graph are more sparsely fitted, with a comparatively high RMSE and lower R-square values than polynomial and RBF kernels.  Figure 11 compares the SVR modelling performance visually with the help of a column chart for Performance during the cross-validation was comparatively weaker than that during the model training phase in both cases. For uncompensated spectra, the RBF kernel showed a relatively better performance and encountered lower error than other kernels, with an R-square value of 0.688 and RMSE of 0.194. In contrast, particle compensated spectra showed a better performance for all kernels, with the highest accuracy obtained by the RBF kernel, achieving an R-square value of 0.732, and RMSE value of 0.180.

Comparison of Model Performance
The SVR model training performance using particle, organic, and nitrate compensated spectra combined is presented in Figure 10c. Here, the polynomial kernel shows a near perfect fit in model training with an R-square of 0.999 and RMSE of 0.010, while the RBF kernel shows a comparatively lower performance with an R-square of 0.967 and RMSE of 0.064. The linear and sigmoid kernels did not indicate a similarly good performance in the model training phase. In the cross-validation phase, RBF has the highest performance with an R-square of 0.760 and RMSE of 0.176 while the polynomial kernel has the second most performance with an R-square of 0.725 and RMSE of 0.184. The analysis of the standard deviation indicates that the level of precision by the model was ±0.1 mg L −1 . Figure 11 compares the SVR modelling performance visually with the help of a column chart for the above three cases: (i) uncompensated or original chloraminated water spectra; (ii) particle compensated spectra; and (ii) particle, organic, and nitrate compensated spectra. Numeric data for these comparisons are provided in Appendix B, Table A2, and Table A3. The R-square and RMSE values in model training and cross-validation indicated that particle, organic, and nitrate compensated spectra with the RBF kernel function can better represent the monochloramine residual concentration. Although the polynomial kernel showed a better fitting with the training data, its performance in cross-validation was relatively lower with error relatively higher than the RBF kernel. Considering the cross-validation performance, RBF appeared to be the most appropriate kernel function. From the figure, it is also evident that uncompensated or original spectra cannot be satisfactorily used in determining the monochloramine residual concentration.

Comparison of Model Performance
The above procedure can be implemented more efficiently by reducing the sample size. Most SVR algorithms require the provision of training samples in a single batch [38]. A new model will require that every time a new sample is added or removed from the training set. Here, three months of data were used in a single batch, which was huge for the purpose of relating spectral features with monochloramine data. Once the appropriate kernel function is determined, recent observations can be used instead of the whole data to train the model. This will significantly reduce the SVR model runtime.

Limitations of the Research
During the chloramination process, along with monochloramine, some dichloramine and trichloramine can form. Control of the chloramination process means that the formation of dichloramine and trichloramine is minimal and is assumed to have negligible interference on the monochloramine spectrum. This research only focused on spectral detection of monochloramine while dichloramine and trichloramine impacts were out of the scope of this paper. performance in cross-validation was relatively lower with error relatively higher than the RBF kernel. Considering the cross-validation performance, RBF appeared to be the most appropriate kernel function. From the figure, it is also evident that uncompensated or original spectra cannot be satisfactorily used in determining the monochloramine residual concentration. The above procedure can be implemented more efficiently by reducing the sample size. Most SVR algorithms require the provision of training samples in a single batch [38]. A new model will require that every time a new sample is added or removed from the training set. Here, three months of data were used in a single batch, which was huge for the purpose of relating spectral features with monochloramine data. Once the appropriate kernel function is determined, recent observations can be used instead of the whole data to train the model. This will significantly reduce the SVR model runtime.

Limitations of the Research
During the chloramination process, along with monochloramine, some dichloramine and trichloramine can form. Control of the chloramination process means that the formation of dichloramine and trichloramine is minimal and is assumed to have negligible interference on the monochloramine spectrum. This research only focused on spectral detection of monochloramine while dichloramine and trichloramine impacts were out of the scope of this paper.
The spectrophotometer's built-in tool was used to complete the particle compensation. However, different manufacturers use different particle compensation algorithms in their The spectrophotometer's built-in tool was used to complete the particle compensation. However, different manufacturers use different particle compensation algorithms in their instrument. This should be explored as the modelling accuracy greatly depends on particle compensation.
The WTP-post chloramination pH was relatively stable, with an average of 8.67 and standard deviation of 0.27, so no spectral shifting was considered. However, in cases of online monitoring where the pH of incoming water significantly varies with time, a pH compensation can be considered to correct spectral shifting.
The quality of drinking water varies from place to place. During the study period, the concentration of DOC ranged between 1.7 and 2.7 mg L −1 while the NO 3 -N concentration ranged between 0.1 and 0.4 mg L −1 and the monochloramine concentration ranged between 3.0 and 5.5 mg L −1 . Hence, the spectral compensation and the associated SVR model works well within this range. Beyond this range, the modelling accuracy may differ.

Conclusions
Effective spectral online detection of drinking water disinfectant (monochloramine) was proposed in this paper. The Tailem Bend drinking water treatment plant in South Australia, which currently uses an amperometric online chlorine analyser to monitor monochloramine residual, was selected as the case study. An online UV-Vis spectrophotometer probe was installed at the WTP to gather spectral water quality information. Absorbance data at various wavelengths were treated in several stages to ensure quality and PCA was used to extract features from these data. In developing the machine learning model, these spectral features were considered as predictor or independent variables while the amperometric analyser data were used as the response or dependent variable.
The SVR algorithm with four different kernel functions: (i) linear, (ii) polynomial, (iii) RBF, and (iv) sigmoid, was considered to determine the best-fitting model. The R-square and RMSE in model training and cross-validation indicated that RBF has better accuracy over other kernels in determining the monochloramine concentration using both compensated and uncompensated spectra. In specific, particle compensated spectra showed better model fitting and lower error than uncompensated spectra. Additionally, compensation for organic (DOC) and nitrate (NO 3 -N) was shown to improve the modelling performance. Finally, the following conclusions can be drawn: • Machine learning with UV-Vis spectrometry can be used in online detection of monochloramine residual; • The choice of the kernel function has a high impact in modelling performance, particularly, RBF kernel has better accuracy for non-linear mapping of spectral data; and • Particle compensation and the newly introduced organic and nitrate compensation improves modelling accuracy.  Appendix A Figure A1. Comparison of online monochloramine analyser data with lab data (green and red dotted line indicate 95% confidence interval for lower and upper limit). Figure A1. Comparison of online monochloramine analyser data with lab data (green and red dotted line indicate 95% confidence interval for lower and upper limit). Sensors 2020, 20, x FOR PEER REVIEW 23 of 30 Figure A2. Post chloramination spectra at the WTP: (a) Uncompensated spectra and (b) particle compensated spectra (absorbance is relatively high in uncompensated spectra due to particle interference and after particle compensation absorbance is significantly reduced). Figure A2. Post chloramination spectra at the WTP: (a) Uncompensated spectra and (b) particle compensated spectra (absorbance is relatively high in uncompensated spectra due to particle interference and after particle compensation absorbance is significantly reduced).