A Novel Hybrid Strategy for Detecting COD in Surface Water

: The prediction of chemical oxygen demand (COD) by ultraviolet–visible absorption spectrum is a common method. Many researchers use the absorbance at the characteristic wavelength to establish COD prediction models. However, selecting the characteristic wavelength is a problem. In this paper, the extreme values of absorption spectrum change rate, was proposed as a new characteristic parameter to determine the characteristic wavelengths. On this basis, a novel hybrid strategy for detecting COD in surface water was proposed. We ﬁrst proposed to combine the ﬁrst derivative method with the permutation entropy method (FDPE) to determine the characteristic wavelengths. Then we used partial least square (PLS) to establish a COD prediction model. Experimental results demonstrated the linear correlation coe ﬃ cient ( R 2 ) of the FDPE_PLS was above 0.99 without turbidity interference. Secondly, a dual-wavelength method (DWM) was proposed to determine the turbidity values. The DWM used slopes of absorbance values at 400 nm and 600 nm to predict the turbidity values. Compared with the single-wavelength method, the DWM improves the measurement accuracy of turbidity. Finally, a new turbidity compensation method was proposed to compensate for the interference in the ﬁrst derivative spectrum. After compensation, FDPE_PLS can predict COD concentrations accurately, whose R 2 was 0.99.


Introduction
Surface water is an important source of water, providing most of our basic water needs. However, surface water pollution issues have been very serious in recent years. Surface water pollution, to a great extent, damages the ecological environment and directly affects people's health [1]. The chemical oxygen demand (COD) in surface water is an important indicator of the degree of surface water pollution, which can reflect the level of oxygen-consuming organic pollutants in surface water. The determination of COD is particularly important in the analysis of water pollution [2].
Currently, the determination of COD in China mainly adopts the national standard chemical method [3]. Although the measurements of the national standard chemical method are accurate, the process is cumbersome and generally requires heating, reaction, and other steps. Other reactants are also required, which causing secondary pollution. This method also requires a long sample transfer time and reaction time, which is not convenient for field use [4]. Ultraviolet-visible (UV-Vis) absorption spectroscopy is a physical method for detecting COD concentration, whose process is simple without secondary pollution. This method can measure COD in real time [5]. Through online detection of COD concentration, the pollution status of surface water can be known in real time, which is of great significance to protect the surface water environment.
When the concentration of COD in water is detected by ultraviolet-visible absorption spectrometry, the absorbance at 254 nm is usually used to obtain the COD measurements, which is called the single-wavelength method [6]. The single-wavelength method is simple, but its stability is poor, as it is easily interfered with, and the measurement range is limited. Wang proposed a new method for selecting different calibration wavelengths based on the COD value to expand the measurement range. However, this method is essentially a single-wavelength method in different measurement ranges, and also needs a turbidity compensation algorithm to remove turbidity interference [7]. The accuracy of the turbidity compensation algorithm inevitably affected the detection accuracy of COD, so high precision was put forward for the turbidity compensation algorithm. Aiming to improve the accuracy of COD detection, the multi-wavelength method was proposed to detect COD. It is important to select appropriate characteristic wavelengths for the multi-wavelength method. Adaptive weighting algorithm (CARS), random frog, and genetic algorithm were used to determine the characteristic wavelengths of COD in aquaculture water by the multi-wavelength method [8]. These methods to extract characteristic wavelengths improve the accuracy of COD measurement in complex water samples. But each time the program runs, the distribution of selected wavelengths may change [9][10][11]. Therefore, these methods have problems of reliability and stability. In order to eliminate the influence of turbidity in COD detection, turbidity compensation is necessary. The absorbance at 546 nm in UV-Vis absorption spectrum was used for turbidity compensation [12]. Although the absorbance at 546 nm is simple for turbidity compensation, the accuracy of COD detection with compensation is not satisfactory. Hu et al. used the fourth derivative method to remove turbidity interference [13]. The fourth-order derivative method has the advantage of no baseline and reduces the signal strength of COD, resulting in the reduction of signal-to-noise ratio, finally affecting the accuracy of COD detection.
In this paper, a novel hybrid strategy for detecting COD in surface water was proposed. A new characteristic parameter, the extreme value of the absorption spectrum change rate, was used as a characteristic parameter to determine the characteristic wavelengths. A combined measurement method based on the first derivative method, the permutation entropy method, and the partial least squares method (FDPE_PLS) was proposed. The first derivative method was used to extract the information of the spectral change rate in the original spectral data. The permutation entropy method was used to extract the extremes of the spectral change rate, and the partial least square method was used to establish COD prediction model. With the aim to improve the accuracy of turbidity compensation, a dual-wavelength method (DWM) and a new turbidity compensation method were proposed. Experimental results demonstrate the hybrid strategy can improve the detection accuracy of COD with turbidity interference and has stability. This paper is organized as follows: In Section 2, we describe materials used in this article and FDPE_PLS algorithm, DWM, and a new turbidity compensation method. We present the experimental methodology and the results in Section 3. Finally, we draw conclusions and indicate directions for future research in Section 4.

Materials and Methods
This section introduces the materials and methods used in this paper. The preparation of samples and the measurement of UV-Vis absorption spectrum are introduced. The sample was divided into a calibration set and test set by joint x-y distance (SPXY). FDPE_PLS is proposed to detect COD. DWM is proposed to measure turbidity. Finally, a new turbidity compensation method was proposed.

Samples Preparation and the Measurement of UV-Vis Absorbance Spectra
According to the standards for surface water environmental quality in China ( GB3838-2002), the water quality of the surface water source of centralized drinking water is divided into 5 categories, and the limit values of COD are 15,15,20,30, and 40 mg/L, respectively. In this paper, COD and turbidity chemical standard solutions were used to prepare water samples. The COD solution (100 mg/L) was prepared by dissolving 0.02125 g potassium hydrogen phthalate in deionized water and diluting to 250 mL. COD standard solutions whose values were 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31, and 32 mg/L, and were prepared by appropriate dilution. In addition, the NTU (nephelometric turbidity unit) standard (7027-1984 ISO) was used to determine the water turbidity. Turbidity solutions of 1, 3,5,10,15,20,25,30,35,40,45,50,80, and 100 NTU were prepared by diluting a 400 NTU turbidity solution with deionized water. All samples were scanned on the SHIMADZU (Model No: UV-3600) UV-vis spectrophotometer interfaced to a microcomputer, with deionized water used as a blank in the spectrophotometer reference cell. The resolution of this spectrophotometer was 0.1 nm. The experimental setup is shown in Figure 1, and the optical path length in quartz cells was 10 mm. Each sample was measured 15 times, and the average was calculated as the spectral measurement of the sample. The absorption spectra of the COD samples and the turbidity samples is shown in Figures 2 and 3, respectively.

Lambert Beer's Law
When the light passes through the solution, part of the light is absorbed by the light-absorbing substances in the solution. The energy of the light radiation is reduced. When the solution concentration is higher, the optical path of light passing through the solution is longer, and the concentration of light-absorbing material is higher, then the more light is absorbed and the less light passes through the solution [12].
The absorbance is defined as: where λ is the wavelength of the incident light, A(λ) is the absorbance, T is the transmission degree, I T represents the transmitted light intensity, I 0 is the incident light intensity when the concentration is 0, L is the optical path, K(λ) is the absorption coefficient, and C is the concentration of the tested sample. For a specific flow-through cell, its absorption optical path L remains unchanged. For specific measured wavelengths as well as specific water samples, the absorption constants K(λ) are constants. In this case A(λ) is proportional to the sample concentration. The COD concentration in water samples can be measured by detecting the absorbance of organic compounds.

Division of Water Samples
A suitable sample selection method of the calibration set can enhance the predictive ability of established models. The sample-set portioning based on joint x-y distance (SPXY) realizes a scientific division of the samples by considering the difference of the spectrum and the concentration of the component of the sample. It has the merit of improving the prediction performance of the model effectively [14].
In this paper, the total of 32 water samples were divided into two sets by SPXY algorithm. Twenty-four samples were classified in the calibration set for modeling, and the other 8 samples were placed into the validation set for testing. The statistical characteristics of the samples were shown in Table 1.

Problem Formulation
PLS combines the advantages of principal component analysis, canonical correlation analysis and linear regression analysis, and is a widely used modeling method in spectral analysis. In order to study the statistical relationship between the dependent variable X = x 1 , · · · x p and the independent variable Y = y 1 , · · · , y p , n sample points were observed and the sample tables X and Y are formed. PLS extracted components x and y from X and Y respectively. When extracting x and y, the following two requirements should be satisfied: (1) x and y should carry as much change information as possible in their respective data sheets; (2) The correlation between x and y reach the maximum.
Two requirements indicate that x and y should represent the table of X and Y. The independent variable x has the strongest explanatory power to the dependent variable y. From the collected spectra ( Figure 2), it can be found that the original spectra reflect comprehensive information about sample concentration. Therefore, the sampled spectral data can be used to predict the COD concentration of the solution. Researchers usually use absorbances at characteristic wavelengths as an independent variable to predict COD concentration. But the problem of characteristic wavelengths selection is difficult to solve. In order to solve this problem, the permutation entropy (PE) is used to determine the characteristic wavelengths which are used to get extreme points. The extreme points are used as independent variables of PLS to predict COD concentration.
PE can process the original spectral data and get extreme points of the original spectrum. It can be obtained that there are three extreme points γ i (i = 1, 2, 3) in the spectral data of different concentrations in Figure 2. However, when PLS uses γ i (i = 1, 2, 3) to predict COD concentration, the accuracy is not satisfied because the independent variable parameters are not enough.
PE can find more extremum points for each concentration by processing the first derivative data of the original spectrum. The first derivative spectrum of the original spectrum is shown in Figure 4. It can be seen from Figure 4 that there are five extreme points α i (i = 1, 2, 3, 4, 5) in the first derivative data of different concentrations, which represent the extreme values of the absorption spectrum change rate under different concentrations. λ i (i = 1, 2, 3, 4, 5) is defined as the wavelength of α i (i = 1, 2, 3, 4, 5). PE can accurately search λ i (i = 1, 2, 3, 4, 5) which can be used to get α i (i = 1, 2, 3, 4, 5). PLS uses α i (i = 1, 2, 3, 4, 5) to predict COD concentration, which improves the prediction accuracy. The comparison of the results of predicting COD by γ i (i = 1, 2, 3) and α i (i = 1, 2, 3, 4, 5) will be introduced in Section 3.1.3. In this paper, the extreme values of the absorption spectrum change rate were proposed as a characteristic parameter to establish the COD prediction model.

FDPE
The derivative method is an effective method for processing spectral data. Due to the UV-Vis absorption spectroscopy, useful information can be obtained from the UV-Vis spectrum data by increasing the derivative order, while too high order would lead to a low signal intensity of the signal to be detected and great sensitivity to noise components [13]. A suitable derivative order is very important. The first derivative has almost no loss of the signal-to-noise ratio and can reflect the absorption spectrum change rate. By deriving Equation (1), Equation (2) can be obtained.
According to Equation (2), dλ is proportional to C for a specific measurement wavelength and a specific water sample [15]. In this paper, the first derivative method was used to process the UV-Vis absorption spectrum. The first derivative spectrum of any concentration can be set as a continuous sequence x(i), i = 1, 2, · · · , N . In order to obtain α i (i = 1, 2, 3, 4, 5) in the first derivative spectrum, PE is used to search the extreme points in the sequence. The calculation process is as follows [16,17].
In each vector X(k), are arranged in increasing order. The ranking result is as follows.
and j p < j q , it can be ranked as follow.
Correspondingly, each vector X(k) can generate a set of symbolic sequence, where l = 1, 2, 3, · · · , n(n ≤ m!). The probability of each symbol sequence is P 1 , P 2 , · · · , P n . Then, PE of x(i) in different windows can be calculated by Equation (8).
H represents the entropy of the first derivative spectrum in a window. When H = 0, the first derivative spectrum is regular, that is, the first derivative spectrum keeps rising or keeps falling. When H 0, the trend of the first derivative spectrum changes, which means the first derivative spectrum has an extreme point. The windows where H changes from zero to non-zero can be used to find α i (i = 1, 2, 3, 4, 5) in the first derivative spectrum.

Partial Least Squares (PLS)
PLS is widely used in spectral analysis and can establish a mathematical model to predict the concentration of sample [18]. In this paper, α i (i = 1, 2, 3, 4, 5) found by PE were used to predict the concentration of COD solutions by PLS. The prediction performance of the model is based on several indices, such as the linear correlation coefficient (R 2 ), root mean-squared error of calibration (RMSEC) and root mean-squared error of validation (RMSEV). The performance indices are shown in Equations (9)- (11).
whereŷ i is the predicted value by calibration model, y is the mean of measurements, y i is the measurement, n is the number of calibration samples, m is the number of validation samples, and A is the number of regression factors [19,20].

The Flow of FDPE_PLS
In this paper, a novel hybrid strategy based on ultraviolet-visible absorption spectroscopy was proposed to detect COD in surface water. Different processing procedures were used with or without turbidity interference. The processing procedure of the hybrid strategy without turbidity interference is FDPE_PLS. The flow chart is shown in Figure 5. The processing procedure of the hybrid strategy with turbidity interference is FDPE_PLS combined with the turbidity compensation method. This part will be introduced in Section 2.5.3.

Double-Wavelength Method (DWM)
Turbidity absorbs visible light, and the absorbance curve changes with the change of turbidity value, which affects the determination of COD in water by spectrophotometry. It is important to compensate for turbidity. The traditional method uses the absorbance at 546 nm to predict the turbidity value. This method is simple, but the accuracy needs to be improved. In order to improve the accuracy, a dual-wavelength method (DWM) was proposed to determine the turbidity values in this paper. Because turbidity has an absorption in the visible light band. The DWM uses the slopes of the lines connecting the absorbance values at 400 nm and 600 nm to predict the turbidity values. As shown in Figure 6, the slopes of the lines and turbidity values have a good linear relationship, which can be used to calculate the turbidity values.

Multiple Scatter Correction (MSC)
MSC method is a commonly used data processing method, which can reduce the influence of scattering [21]. The calculation process is as follows: (1) Calculate the average spectrum of the sample (2) Perform unary linear regression (3) Multiple scatter correction
The specific steps of the hybrid strategy with turbidity interference are shown in Figure 7.

Results and Discussion
This section focuses on the experimental verification of the proposed method. Firstly, the experiment without turbidity interference was designed. FDPE_PLS model was used to predict COD, and the results were compared with the PE_PLS model. The experiment of turbidity compensation was designed, and the turbidity compensation model was established and compared with MSC. Finally, the method proposed in this paper is tested on actual water samples and compared with other common methods.

FDPE_PLS (1) First derivative spectroscopy
In the absence of turbidity interference, the first derivative of the original spectrum was first processed. The acceleration and deceleration information of the original spectrum is added, as shown in Figure 4. Combining with PE in Section 3.1.1, (2) can accurately determine the characteristic values under the condition of reducing noise interference. It can be seen from Figure 4 that there are four local maximum points α 1 , α 3 , α 4 , α 5 and one local minimum point α 2 in the first derivative spectrum, which reflected four local maximum values of change rate and one local minimum value of change rate in the original spectrum. When the concentration of the solution changed, the amplitudes and wavelengths of α i (i = 1, 2, 3, 4, 5) changed accordingly.
(2) Feature wavelengths extraction by PE PE can accurately locate the time and position of the sequence change. PE algorithm needs to configure parameters L,m,t. In this paper, according to reference [22] and several attempts, set L = 10, m = 5, and t = 1. After calculating the entropy values of the first derivative spectrum in the concentration range of 1 ∼ 32 mg/L, Table 2 shows the windows in which entropy change from zero to non-zero. In Table 2, it can be found that the entropy changes five times from zero to non-zero, which means that the first derivative spectrum has five extreme points. The conclusion is consistent with the actual situation in Figure 4. By determining the window where entropy changes from zero to non-zero, λ i (i = 1, 2, 3, 4, 5) can be accurately found. According to the calculation of PE in reference [22]. λ i (i = 1, 2, 3, 4, 5) can be calculated by Equation (16).

Wavelength
Variation Range In order to verify that dA dλ and C are linear when λ changes in a small range. We established the fitting relationship between COD concentrations and α i (i = 1,2,3,4,5). The results are shown in Figure 9 and Table 4. It can be seen from Table 4 that there are good linear relationships between COD concentrations and α i (i = 1, 2, 3, 4, 5) when λ i (i = 1, 2, 3, 4, 5) changes in a small range. The selection of λ i (i = 1, 2, 3, 4, 5) and α i (i = 1, 2, 3, 4, 5) are effective.

(3) Comparison of PE and derivative method
The original spectrum measured by the spectrometer is not smooth, and the first derivative spectrum may produce peak noise, as shown in Figure 10b. PE algorithm and the second derivative algorithm are used to calculate the spectral extremum of the first derivative in Figure 10a,b respectively. The second derivative method is used to find the extremum point of the first derivative spectrum, and the results are shown in Figure 11a,b.  It can be seen from Figure 11a that for the smoothed first derivative spectrum, the second derivative method can correctly find five extreme points α i (i = 1, 2, 3, 4, 5). It can be seen from Figure 11b that when there is a spike noise, there is an error in the number of effective extreme points calculated by the second derivative method. µ 1 and µ 2 are interference due to noise.
Set L = 10, m = 5, and t = 1. PE algorithm was used to find the extreme points of the first derivative spectrum, the window of entropy value changes from zero to non-zero, as shown in Table 5: According to Equation (16) and Table 5, the effective extreme points calculated by PE algorithm are correct.PE algorithm does not need another smoothing process, which can effectively avoid the interference of peak noise and find the characteristic wavelength.
(4) PLS model All samples were processed by SPXY and divided into a calibration set and validation set. The division result is shown in Table 1. α i (i = 1, 2, 3, 4, 5) in the calibration set were used as independent variables and COD concentrations were used as dependent variables to establish the COD prediction model. The model is shown as Equation (17).
The fitting result of the model is shown in Figure 12, where black dots and red dots represent the fitting results of the calibration set and validation set, respectively. The performance of the fitting results is shown in Table 6.

PE_PLS
We use the original spectrum to detect the COD concentration, and the PE algorithm was adopted to process the original spectral data directly. PE was used to find γ i (i = 1, 2, 3) in the original spectrum. PLS was used to establish the prediction model. According to the division of calibration set and validation set in Table 1, γ i (i = 1, 2, 3) in the calibration set were used as independent variables, and the COD concentrations were used as dependent variables to establish COD prediction model. The model is shown as Equation (18).
The fitting result of the model is shown in Figure 13, where black dots and red dots represent the fitting results of the calibration set and validation set, respectively. The performance of the fitting results is shown in Table 6.

Comparison of FDPE_PLS and PE_PLS
We performed two experiments without turbidity interference. The first was using FDPE_PLS to establish the model by five extreme points in the first derivative spectrum. The second was using PE_PLS to establish the model by three extreme points in the original spectra. The calibration and verification sets were used to verify the performance of the two COD prediction models. The performance of the two models were compared, and the result is shown in Table 6.
It can be seen from Table 6 that the performance of FDPE_PLS is better than PE_PLS. FDPE_PLS has lower RMSE and average relative error, which means the prediction model established by five extreme points in the first derivative spectrum has higher accuracy. So FDPE_PLS was chosen as the modeling method in this paper.

Effectiveness of DWM for Turbidity Measurement
To verify the accuracy of the method, 10 different water samples from a city in Jilin Province were collected and measured their UV-Vis absorption spectra. The double-wavelength method proposed in this paper and the 546 nm single-wavelength method were used to predict the turbidity values of water samples, respectively. The prediction effect is shown in Table 7. It can be seen from Table 7 that the dual-wavelength method can predict the turbidity of water samples more effectively than the single wavelength method.

Establish Turbidity Compensation Model
Aiming to establish the turbidity compensation model, we carried out the following experiments. We added the turbidity solutions into the COD solutions to produce ten different mixtures, as shown in Table 8. The UV-Vis absorption spectrum of these mixtures and COD solution whose concentration was 20 mg/L are shown in Figure 14. Their first derivative spectra are shown in Figure 15. Table 8. The COD and turbidity values of ten samples.  In order to establish a turbidity compensation model for getting τ i (i = 1, 2, 3, 4, 5), we had to calculate β i (i = 1, 2, 3, 4, 5) first. The fitting results between β i (i = 1,2,3,4,5) and turbidity values are shown in Figure 16. The turbidity values can be calculated by DWM. It is clear that β 1 caused by turbidity can be represented by two linear relationships, while the variables β 2 , β 3 , β 4 , and β 5 can be represented by one linear relationship. A series of mathematical equations were developed which describes the relationship between turbidity values and variables. The equations are shown as follows: where x is the turbidity value. The correlation coefficients of these linear equations are shown in Table 9. It can be seen from Table 9 that β i (i = 1, 2, 3, 4, 5) and turbidity values have good linear relationships and β i (i = 1, 2, 3, 4, 5) can be calculated by Equation (19) and Equation (20). From Equation (15), the turbidity compensation of the first derivative spectrum can be completed and τ i (i = 1, 2, 3, 4, 5) can be obtained. τ i (i = 1, 2, 3, 4, 5) are the characteristic values, which can be used to detect COD values with turbidity interference.

The Comparison of Turbidity Compensation Models
In order to verify the effectiveness of the derivative compensation (DC) method proposed in this paper, we processed ten mixtures, which are shown as Table 8, to remove turbidity interference by DC method and MSC method, respectively. Then FDPE_PLS were used to predict the COD concentrations. The comparison result of MSC method and DC method is shown in Figure 17. The results show that DC is an effective method to remove turbidity interference.

Experiments of Actual Water Sample
We have verified the accuracy of COD measurement using FDPE_PLS with or without turbidity interference in the laboratory. In this section, the accuracy of FDPE_PLS and DC method proposed in this paper will be verified by measuring actual water samples. Ten actual water samples collected in a city in Jilin Province were used for testing. The COD values of actual water samples were determined by chemical method in the laboratory, and the accuracy was ±5%. The turbidity values were determined by turbidimeter ZD-2A with an accuracy of ±8%. The COD and turbidity values of the ten water samples are shown in Table 10. After processing by DC and calculating by FDPE_PLS, the fitting result between actual COD values and predicted COD values is shown in Figure 18. The R 2 is 0.99, and the average relative error is 0.07.

Comparison of Several Methods
In order to compare the accuracy of the method proposed in this paper with other methods, the ten water samples were modeled by the following methods: the FDPE-PLS, MSC combined with FDPE-PLS, DC combined with FDPE-PLS, 254 nm single-wavelength model, MSC combined with 254 nm single-wavelength model and 254 nm-546 nm dual-wavelength method. The fitting results of the COD calculated by these models and the COD measurements were shown in Figure 19.  Figure 19 shows that DC combined with FDPE-PLS has the best performance, whose R 2 was 0.99 and the average relative error was 0.07. This result can prove the effectiveness of the method proposed in this paper.

Conclusions
In this paper, a new characteristic parameter, the extreme value of the absorption spectrum change rate, is proposed for the first time as a characteristic parameter to determine the characteristic wavelength. On this basis, we proposed to use FDFE to determine the characteristic wavelength. Compared with the traditional methods, this method improved the detection accuracy and stability of the characteristic wavelength. Secondly, a dual-wavelength method was proposed for the first time, which is different from the traditional single-wavelength method. The R 2 between the predicted turbidity value and the actual turbidity value was 0.98. Compared with the single-wavelength method, DWM improved the measurement accuracy of turbidity. Finally, a new turbidity compensation method was proposed. The traditional turbidity compensation method is to process the original spectral data. The new method of turbidity compensation compensates for the interference caused by turbidity in the first derivative spectrum. The experimental results showed that the FDPE_PLS with turbidity compensation had good performance.
Although this paper has made some progress, there are still some limitations. For example, the influence of temperature and PH was not considered, and there was no standard method for selecting the parameters of permutation entropy. These all need to be improved in future work.