Performance Evaluation for a Contamination Detection Method Using Multiple Water Quality Sensors in an Early Warning System

In this approach, a method utilizing data series from multivariate parameters to detect contaminant events is discussed and evaluated. Eight water quality sensors (pH, turbidity, conductivity, temperature, oxidation reduction potential, UV-254, nitrate and phosphate) are used in this study and the most commonly used herbicide, glyphosate, is selected as the test contaminant. Variations of all parameters are recorded in real time at different concentrations. The results from the experiment and analysis show that the proposed method with suitable optimization can detect a glyphosate contamination less than 5 min after the introduction of the contaminant using responses from online water quality sensors. The average true positive rate is 95.5%. The study also discusses the impact of the number of sensors on detection performance. The results show that if the number of sensors is reduced from 8 to 5, the true positive rate performance is still good. This indicates that the method is flexible and can be applied using a smaller number of sensors to reduce monitoring costs.


Introduction
Water systems are vulnerable to contamination accidents and bioterrorism attacks because they are relatively unprotected, accessible, and often isolated [1].The past decades have witnessed a mounting number of contaminant incidents in China.Unlike conventional pollution, contaminant accident emergency has involves uncertainty, urgency and the need for rapid response.Therefore, how to detect a potential contamination incident and identify a specific contaminant in a water source has raised concerns all over the world, especially after the events of 11 September 2001 in US.
One approach for avoiding or mitigating the impact of contamination is to establish an Early Warning System (EWS).EWS should provide a fast and accurate means of distinguishing between normal variations and contamination events [2].Ideally, it should be inexpensive, low maintenance, easy to integrate into network operations and reliable, with few false positives and negatives [3].
A key part of an EWS is the detection module, which utilizes online sensors to evaluate water quality and detect the presence of contamination.Generally, there are two types of online water quality sensors.The first type refers to non-compound specific or conventional water quality sensors, which are normally used for routine water quality parameters, including pH, chlorine, total organic carbon (TOC), oxidation reduction potential (ORP), conductivity and temperature.The second type refers to compound specific water quality sensors or advanced sensors, which are capable of confirmative detection at low concentrations for a specific component [4][5][6][7][8][9].
Although compound specific sensors are capable of confirmative detection for contaminants at low concentration, the long analysis time and the high cost may represent disadvantages during a contaminant accident.In recent years, conventional quality sensors have played a growing role.
As summarized by McKenna et al. [10], two types of approaches to developing and testing event detection using water quality signals have been examined.First, laboratory and test-loop evaluation of sensors and associated event detection algorithms provides direct measurement of chemical changes in background water quality caused by specific contaminants [11][12][13][14].For example, Hall et al. [11] reported a sensor response experiment for nine types of contaminants and realized that more than one sensor responded to each tested contaminant.After noticing this phenomenon, researchers have attempted to develop contaminant detection methods using responses from multiple sensors.Yang et al. [12] explored a real-time event adaptive detection, identification and warning (READiw) methodology in a drinking water pipe.The suggested adaptive transformation of sensory measurements reduced background noise and enhanced contaminant signals.In the method employed by Yang et al., the relative value of concentrations of free and total chlorine, pH and ORP are used for contaminant classification.This allowed for contaminant detection and further classification based on chlorine kinetics.Kroll [13] reported the Hach Homeland Security Technologies (HST) approach using multiple sensors for event detection and contaminant identification.In the Hach HST approach, signals from five separate orthogonal measurements of water quality (pH, conductivity, turbidity, chlorine residual, TOC) are processed from a five-paramater measure into a single scalar trigger signal.The deviation signal is then compared to a preset threshold level.If the signal exceeds the threshold, the trigger is activated [13].In Kroll's method, although responses from multiple sensors are utilized, their internal relationship is not explored.
The second approach to event detection is based on signal processing and data driven techniques [10,[15][16][17][18][19][20].For example, Hart et al. [15] reported a linear prediction filter (LPF).The LPF method predicts the water quality at a future time step and evaluates the residual between predicted and observed water quality values.Klise and McKenna [16] developed an algorithm to classify the current measurement as normal or anomalous by calculating multivariate Euclidean distance (MED).
The MED approach provides a measure of the distance between the sampled water quality and the previously measured samples contained in the history window.Allgeier et al. [17] and Raciti et al. [18] utilized artificial neural networks (ANN) and support vector machines (SVM) to classify water quality data into normal and anomalous classes after supervised learning training.Perelman et al. [19] and Arad et al. [20] reported a general framework that integrates a data-driven estimation model with sequential probability updating to detect quality faults in water distribution systems using multivariate water quality time series.A common feature of the methods mentioned above is that they are merely relying on data process.The physical characteristics of signals responding to contaminants are not considered in these methods.For online water quality sensors, fluctuations can be caused by equipment noise or presence of contaminant.
Liu et al. [21] proposed a method for real-time contamination detection using multiple conventional water quality sensors for source water.Eight sensors were used in the case study.In this paper, we aim to extend this research by determining how the number of sensors influences the detection performance and identifying the optimal combination of sensor deployment.The tested data are from contaminant dosing experiments in a laboratory.

Pilot-Scale Contaminant Injection and Monitoring System
The pilot-scale system used in this study is a recirculating system simulator in the School of Environment Laboratory at Tsinghua University, Beijing, China.A process flow schematic of the pilot-scale system used for baseline establishment and single-pass contaminant tests is shown in Figure 1.The water tank is approximately 85 cm high with a diameter of 70 cm, and has a total capacity of 300 L. The tank is linked with the Guardian Blue early warning system [13] via a peristaltic pump at 0.5 L/min.The Guardian Blue early warning system, including Guardian Blue event monitor, agent library, water panel, TOC analyzer, and automatic sampler, is a system developed by Hach that detects, classifies, and alerts of a wide variety of threat contaminants.In this study, the system was only utilized as an online monitoring system.The system was operated in recirculation mode for baseline establishment.In this mode, 300 L source water flows through the eight sensors and back to the tank.The entire volume of water in the loop is replaced every 72 h if no contaminant test is conducted.Generally, the process of establishing baseline takes 4-6 h before any contaminant experiments can be carried out.When operating in single-pass contaminant mode, the target contaminant is injected into the pipe connecting the tank and sensors via another peristaltic pump.It is injected at a rate of 2-20 mL per minute depending on the concentration requirement.The water combined with contaminant flows through the sensors directly into a specific waste liquid bucket, avoiding pollution of the water in the tank.

Sensors Investigated
An online water quality monitoring system developed by Hach was utilized in this study.The system can measure the following eight parameters simultaneously and continuously: temperature, pH, turbidity, conductivity, oxidation reduction potential (ORP), UV-254, nitrate and phosphate.Table 1 shows a list of the parameters and the detailed information of their associated sensors.

Contaminants Investigated
The contaminant was determined according to statistical reports on water pollution incidents in urban water supply systems in China over the past 20 years.Glyphosate is known as a broad-spectrum systemic herbicide that is used to kill weeds, especially annual broadleaf weeds and grasses, and was selected as the target contaminant in this study.Specific quantities of the contaminant were injected into the system simulator.

Experimental Procedure
Sensors were calibrated in accordance with the manufacturer's recommendations and were verified with a calibration check standard.Before the introduction of contaminants, the experimental system was kept running to establish a baseline.Sensor data were collected continuously and archived electronically to establish stable baseline conditions and to record sensor responses to injected contaminants.Data from the ORP, nitrate, temperature, pH, conductivity, turbidity and UV sensors were monitored and recorded every 1 min during the test period, while the phosphate sensor was recorded every 5 min.After the baseline was established, a specific concentration of contaminant was injected.Each contaminant injection took over 20-40 min to reach a stabilized reading.The sensors were then supplied with uncontaminated raw water and the responses returned to the baseline.The same contaminant, at a different concentration, was injected after sensor responses had returned to the baseline following the previous test [22].

Detection Method
In this study, it is assumed that multiple water quality sensors can respond to a contaminant simultaneously.The method detects contamination by exploring the correlative relationship between responses from multiple water quality sensors [21].This relationship is evaluated using the correlation coefficient r, which is calculated by: in which x and y refer to two separate water quality sensors, x, y ∈ (pH, ORP, UV, …); x and y stand for mathematical expectation; i goes from 1 to n and represents the index for window size.xi and yi stand for absolute values for corresponding sensors at time i.The number of data or window size is given by n.The window size is the number of past observations used to calculate the correlation coefficient.For each sensor, a new observation enters the sliding window at every time step t and the oldest observation exits (i.e., first in first out).
The value of rxy is between −1 and 1.In this study, a correlation indicator Cxy is calculated using A contamination alarm will be trigged if The value of thresholdindicator and thresholdalarm can be determined based on optimization analysis using data from experiments and real events.
The performance of the detection method is measured through detection time (DT), true positive rate (TPR), false positive rate (FPR) and false negative rate (FNR).DT is defined as the time difference between a contamination event taking place and when it is detected, and is evaluated by: where T 0 is the time when the contamination event occurs and T 1 is the time when the contamination event is detected.A smaller DT means the detection method is more effective and can detect contamination within a shorter time frame.
TPR, FPR and FNR can be calculated by [20]: where TP (true positive) is the detection of an actual event (alarm on); FP (false positive) refers to a routine operation being incorrectly classified as a contamination event (alarm on); TN (true negative) refers to a routine operation correctly being classified as such (alarm off); FN (false negative) means that an actual event is not detected (alarm off).TPRT denotes the true positive rate after time T. A greater TPR means the method is more capable of detecting a real event, while a small FPR implies the method is less likely to classify a routine operation as an event.
In this study, the calculation is based on a 1 min step.A contaminant injection with period of t is assumed to be t contamination events.Within the period of contamination, if the method can detect the event, then TPR is used to evaluate the performance.The TPRT is expressed as the true positive rate within the period from T to the end of injection.TPR1 will be used as an evaluation indicator in this study unless otherwise indicated.

Parameter Optimization and Validation
Liu et al. [21] showed that this method was able to detect a contaminant event in a short time period and that the parameters had a significant impact on the detection performance.Likewise, in our study, an event can be detected if three parameters are given optimal values.Therefore, an optimization method can be used to obtain the best combination of thresholdindicator, thresholdalarm and window size.Non-Dominated Sorting Genetic Algorithm-II (NSGA-II) is a multi-objective optimization algorithm [23] used in recent studies [24][25][26].It is computationally fast and has been shown to provide better coverage and maintain a better spread of solutions than other multi-objective algorithms.
The procedure of NSGA-II is summarized by the following steps: (1) generate the initial population randomly; (2) evaluate the performance of each chromosome and perform a fast non-domination sort; (3) produce the offspring through binary tournament selection, crossover and mutation; (4) form intermediate populations and perform a fast non-domination sort; (5) keep reproducing the new population until pre-set criteria is met.In this study, the minimum of "FNR" and the minimum of "FPR" were used as two fitness functions to evaluate the population of each generation.Values for NSGA-II operators were determined based on the literature [27] and listed in Table 2.

Table 2.
Values of GA operators adopted.

Parameter Name Value
Population 250

Generation number 200
Crossover rate 0.85 Mutation rate 0.05

Correlative Responses
In this study, sensor response experiments for glyphosate with different concentrations were conducted.The results from the experiment involving glyphosate are shown in Figure 2a,b.In the experiment, glyphosate solutions with concentrations of 0.8 mg/L, 2.0 mg/L, and 4.0 mg/L were twice added in sequence.The data series obtained from the first and second time were recorded as series "glyphosate-A" and "glyphosate-B".The "glyphosate-A" series is illustrated using solid green bars at the top of Figure 2. As shown in Figure 2, ORP and phosphate increase, while pH and nitrate decrease.The responses are mainly due to the introduction of glyphosate solution, as it is slightly acidic and has some oxidizing ability.The solution diluted the source water so nitrate shows a weak decrease.Conductivity and UV-254 sensors may have shown some response, but, if this is the case, the responses are hidden by the fluctuations from source water.Temperature has a clear increasing trend when the axes are enlarged, as shown in Figure 2a.Turbidity showed increases after both injections, which may appear to be a delayed reaction to the contaminant.However, these peaks in turbidity were not caused by the contaminant, but rather were an immediate result of a change in pump speed.Uncontaminated raw water is supplied to clean the pipe and system and the pump speed is increased in order to quickly return to baseline.The turbidity sensor is very sensitive to changes in water velocity, so it fluctuated as shown in the Figure 2a.Unexpected spikes are shown after the third injection in some sensor responses, mainly due to the sudden malfunction of the system.
For different contaminant concentrations, sensor responses show correlative relationships, especially for pH, nitrate, ORP and phosphate.This suggests the correlative response is caused by the introduction of contaminant and implies that this type of phenomenon can be utilized for detection of the presence of contamination.The magnitudes of the sensors' responses were related to contaminant concentrations.

Parameter Optimization and Validation
The original data series for both glyphosate-A and glyphosate-B were each separated and regrouped into three individual new series, leaving a total of six new data series.Each new series contained 80-100 values.The first 60 values of the series were data for the no-contamination scenario, and the rest of the data were for the contamination scenario with different concentrations.The three new data series (A1, A2, A3) created from glyphosate-A were used as the calibration datasets and the other three new data series (B1, B2, B3) created from glyphosate-B were used as the validation datasets.The optimization results are shown in Figure 3.As shown in Figure 3, although there were several solutions in the last generation, only one solution had the best performance, as pointed out with an arrow.The values of thresholdindicator, thresholdalarm and window size were 0.6912, 7, and 16, respectively.The TPR and FPR were 93.7% and 6.8%, respectively, which indicates that the proposed method has the ability to detect contamination events caused by glyphosate solution and that the accuracy is quite high.
The datasets used for validation were obtained from glyphosate-B data series, which contained three experiment data series with the same concentrations as glyphosate-A.For the parameters, the optimal values obtained from the calibration were used.Table 3 shows the results of validation.As shown in Table 3, the TPRs for the concentrations 0.8 mg/L, 2.0 mg/L and 4.0 mg/L were 95.5%, 100% and 100%, respectively.The FPR for all concentrations were 0%, which revealed excellent detection performance for glyphosate solution.The TPR of 100% indicated that the event could be detected one minute after the contamination occurred, which is very important in real events.

Sensors Selection
As can be seen from Figure 2, not all the sensors had obvious responses when contaminant was added.For example, the results from A-1 dataset show that turbidity and conductivity sensors both had weak responses.This leads to an interesting question: would the performance be affected if some sensors were removed?The correlation coefficients for each couple of sensors for the "contamination" scenario at the 85th minute were calculated and listed in Table 4. From the results, it can be seen that the coefficients of turbidity and other sensor indicators at the 85th minute were smaller than the preset threshold.This suggests that the removal of turbidity would not affect the TPR performance at this time.Confirmation required further testing, so the turbidity and conductivity sensors were removed.The parameters obtained from the calibration were used and the TPR performances were shown in Figure 4.The results showed that the FPR for glyphosate-A and glyphosate-B were 4.6% and 0%, respectively, which indicated that the performance did not change much, and was even improved.However, the TPR was greatly impacted, especially for the lowest concentrations.Figure 5 shows the sum of correlation indicators at each time step that contaminant was added (from the 61st to 85th minutes in A-1 dataset).It revealed two situations in which eight sensors and six sensors were used.The sum of correlation indicators for eight sensors was higher than that for six sensors for the first ten minutes.This indicates that the turbidity and conductivity sensors responded to the injection of glyphosate, although this was not clear from the change of the original data series in Figure 2. Figure 5 also reveals that the turbidity and conductivity sensors made little contribution to event detection after the concentration stabilized.Another optimization was conducted in order to find the optimal parameters when using six sensors.The three new data series created from glyphosate-A were again used as the calibration datasets and the other three new data series created from glyphosate-B were used as the validation datasets.The optimal values of thresholdindicator, thresholdalarm and window size were 0.6111, 7, and 21, respectively.The TPR and FPR results are shown in Figure 4, Figure 6 and Table 5.It can be seen that the average TPR and FPR for the six data series were 88.3% and 3.8%, respectively.This was lower than, but close to, the performance with eight sensors.The detection method appeared to adapt well as the number of monitoring sensors changed.To confirm this, 93 (1 + 8 + 28 + 56) sensor combinations were created and optimized.The number of sensors ranged from five to eight.For each combination, there were several optimal solutions with different FNR and FPR.Here the distance from each solution point to the base point was used as a calculation method to choose the best solution.The best solution points for each combination are shown in Figure 7.Each point in Figure 7 represents the result of one sensor combination.The red one corresponds to the eight-sensor combination, while the green, blue and yellow ones correspond to seven-sensor, six-sensor and five-sensor combinations, respectively.The points in the red rectangle represent the optimal combinations when FNR and FPR were less than 5%.One point is the eight-sensor combination, two points are seven-sensor combinations and two points are six-sensor combinations.Although there is no five-sensor combination point inside the rectangle, the closest five-sensor point also showed a reasonable performance with an FNR of 5.4% and FPR of 2.6%.It is worth noting that the detection method was able to adapt well as the number of monitoring sensors decreased from eight to five, as expected.Table 6 listed the sensors (parameters) used in different optimal solutions.From the results of Table 6, it can also be seen that the removed sensors were mainly UV-254, conductivity, or temperature.Figure 2 shows that these sensors did not change visibly in response to contamination events, whereas responses of the pH, ORP, nitrate and phosphate sensors were very strong.It can also be seen from Figure 7 that the points representing fiv-sensor combinations where generally further from the origin of the graph (i.e., optimal point) than the points representing six-sensor combinations.This was also true for six-sensor points when compared with seven-sensor points.More sensors help to improve the detection performance but also increase costs.However, as mentioned previously, appropriate selection of the sensors results in good performance when five, six, seven, or eight sensors are used.In another words, utilizing fewer sensors in areas where reducing costs is important would also lead to excellent detection performance.

Conclusions
In this study, a method utilizing the correlative relationship between multiple sensors was optimized.The results from the experiment and analysis showed that the method with suitable optimization could detect a glyphosate contamination in less than 5 min after the introduction of contaminant using responses from online water quality sensors.The average TPR1 was 95.5%.
The study discussed the impact of the number of sensors on detection performance.The results showed that if the number of sensors was reduced from eight to five, the TPR performance was still good.This indicates that the method is flexible and can be applied using a smaller number of sensors to reduce monitoring costs.

Figure 1 .
Figure 1.A process flow schematic of the pilot-scale system.

Figure 3 .
Figure 3.The optimization results for glyphosate-A series with eight sensors.

Figure 5 .
Figure 5.The sum of correlation indicators at each time step in two situations.

Figure 6 .
Figure 6.The optimization results for glyphosate-A series with six sensors.

Figure 7 .
Figure 7.The result for different sensor deployment.

Table 1 .
Detailed information of the parameters and sensors.

Table 3 .
The TPR and FPR performances of glyphosate-A and glyphosate-B series.

Table 5 .
The TPR and FPR performances of glyphosate-A and glyphosate-B series.

Table 6 .
Sensors used in each optimal solution.