Retrieving Water Quality Parameters from Noisy-Label Data Based on Instance Selection

: As an important part of the "air–ground" integrated water quality monitoring system, the inversion of water quality from unmanned airborne hyperspectral image has attracted more and more attention. Meanwhile, unmanned aerial vehicles (UAVs) have the characteristics of small size, ﬂexibility and quick response, and can complete the task of water environment detection in a large area, thus avoiding the difﬁculty in obtaining satellite data and the limitation of single-point monitoring by ground stations. Most researchers use UAV for water quality monitoring, they take water samples back to library or directly use portable sensors for measurement while ﬂying drones at the same time. Due to the UAV speed and route planning, the actual sampling time and the UAV passing time cannot be guaranteed to be completely synchronized, and there will be a difference of a few minutes. For water quality parameters such as chromaticity (chroma), chlorophyll-a (chl-a), chemical oxygen demand (COD), etc., the changes in a few minutes are small and negligible. However, for the turbidity, especially in ﬂowing water body, this value of it will change within a certain range. This phenomenon will lead to noise error in the measured suspended matter or turbidity, which will affect the performance of regression model and retrieval accuracy. In this study, to solve the quality problem of label data in a ﬂowing water body, an unmanned airborne hyperspectral water quality retrieval experiment was carried out in the Xiao River in Xi’an, China, which veriﬁed the rationality and effectiveness of label denoising analysis of different water quality parameters. To identify noisy label instances efﬁciently, we proposed an instance selection scheme. Furthermore, considering the limitation of the dataset samples and the characteristic of regression task, we build a 1DCNN model combining a self attention mechanism (SAM) and the network achieves the best retrieving performance on turbidity and chroma data. The experiment results show that, for ﬂowing water body, the noisy-label instance selection method can improve retrieval performance slightly on the COD parameter, but improve greatly on turbidity and chroma data.


Introduction
At present, unmanned airborne hyperspectral retrieving method [1][2][3][4][5][6] is popular, because the unmanned aerial vehicle has the advantages of quick response, convenient acquisition of spectral data, unlimited detection area, and the capability of retrieving non-point source concentration. This kind of method is suitable for fast surface source monitoring tasks in small water area.
Previous studies [1,[7][8][9][10] have shown that when light enters the water, it will be scattered and absorbed by suspended substances and various particles in the water, which will lead to the change of the outgoing light intensity. When the quantity of the water body changes, the proportion of light scattering and absorption by particles in water will also change, so the spectral reflectance curve can be measured to retrieve the quantity parameters of the water body.

1.
In case of label-noisy problems for flowing water, an enhanced RegENN instance selection scheme is proposed to identify noisy label instances; 2.
Experiments on the retrieval of turbidity, chroma and COD are conducted to verify the necessary of noisy-label instance selection for the turbidity parameter; 3. Experiment results on retrieval of turbidity, chroma and COD show that it is easy to introduce label noise to turbidity and chroma, while COD is more stable; and 4.
The 1DCNN network combining Self Attention module is proposed for regression. The network achieves the best retrieving results on turbidity and chroma data.

Water Sampling and Measurement
When the UAV is performing a flight task, we take water samples at the same time. We recorded the latitude and longitude information and photographed the landform of the sampling points while measuring the water quality parameters. The water quantity parameters are measured by portable sensors. Limited by the instrument, three water parameters were measured including chroma, turbidity and COD. The turbidity is measured by Hach 2100Q with 0.01 NTU resolution. The COD is measured by INESA COD-571 analyzer with 8%FS resolution. Chromaticity is measured by Hach LICO620 chroma analyzer. At every sampling point, we measured the parameters 5 times and recorded the average value. The values of chroma range from 32.07 to 69.67 Hazen. The values of turbidity range from 3.33 to 12.51 NTU. The values of COD range from 2.22 to 9.49 mg/L. Detailed statistics of the data set are shown in the Table 1. We use the DJI Wind 4 UAV, which weighs 7.3 kg, has a maximum load of 24.5 kg and a maximum flight speed of 14 m/s. Corning Micro HSI410 model is adopted for collecting hyperspectral images. This type of hyperspectral imager is a push-scan imager, with an effective detection range from 400 to 1000 nm and a spectral interval of 4 nm, with a total of 150 bands. Considering the mosaic of image strip, the lateral overlap is set to 40%. Due to the curvature of the river, the flight was completed twice to cover the full area of the experimental site. Considering the time interval between the two flight tasks and the change of light conditions, the standard whiteboard and standard grayboard with a reflectance of 70% and 25% and a size of 2 × 2 m were placed in the two flight areas as a reference. The UAV flew at an altitude of 200 m and a flight speed of 10 m/s. The duration of a single flight was less than 10 minutes and the weather was sunny. We assume that in a single flight, the lighting conditions are basically unchanged [30][31][32]. The flight area and route planning are shown in Figure 3.

Geometric Correction
Due to the vibration and airflow effects when the drone is flying, the image is tilted and offset, and geometric correction is required [33]. Geometric correction of the original image based on latitude and longitude and inertial guidance information recorded by hyperspectral imager.

Radiation Correction and Spectral Reflectivity
According to radiation transmission theory, the radiance of the target detected by the hyperspectral imager mounted on the drone has two parts. One part is radiation from water, and the other part is diffuse reflection from the sky. Among them, the sky diffuse reflection is radiation information without any water surface information, which needs to be removed. According to [3,5,34], radiance detected by a spectrometer can be expressed as follows: where L w is the radiance of departure from water; L sky is the diffuse reflection of the sky without any water information; r is the reflectivity of the air-water boundary facing the sky light, which is influenced by various factors such as solar system, observation geometry, wind speed, and etc. the value of r can be set in the range of r ∈ [0.021, 0.05]. In a breezy or windless environment, the water surface is calm and r is set to 0.022. When the wind speed is about 5 m/s, r can be set to 0.025, and when the wind speed is 10 m/s, r is set to 0.026-0.028. In order to calculate the reflectivity of the water surface, the total incident radiation L g needs to be estimated. According to [3,35], we can place the gray standard board on the waterside ground, and the gray standard plate reflectivity is about 10% to 30%. Then we can estimate the total incident radiance by Equation (2).
where R g is digital value of the radiance, E d is the reflectivity of the standard gray board.
Finally, the departure reflectivity of water R w can be calculated by Equation (3):

Spectral Curve Filtering
Due to the instability of the sensor and the influence of weather factors, hyperspectral data often introduces noise when acquiring, which not only reduces the image quality, but also affects the results of subsequent data processing [36]. To eliminate noise and glitch conditions, we smoothed the spectrum of each pixel using the Savitzky Golay filtering method [37].

Noisy-Label Instance Selection
Given a data set For regression tasks, according to the regression definition, we can write as: where f is a regression mapping function, ε i is the error of the predicted value and the label value. For data with noisy labels, the label y i may not be true values, and we assume that the latent true values are y i , the error between the latent true value y i and the actual value y i needs to be considered. We assume the noise error conforms to a normal distribution, then Equation (4) can be written as: where γ i indicates mean-shift parameter, representing the mean value of the difference between the label value and the latent true value. µ i is random error and µ i ∈ N(0, σ 2 ). If γ i is non-zero, it means this sample pair (x i , y i ) may be polluted by noise. Meanwhile, to guarantee the fidelity of the data, we assume that the noisy-label instances are sparse, which can be expressed as: where k is a parameter indicating maximum noise label threshold. The value of k is usually unknown and related to specific dataset. For regression problems, the solution objective can be expressed as: where l(a, b) is a function measuring the distance of a and b, g is a sparse loss function. Equation (7) indicates that the number of detected noisy samples should be as small as possible in the case of eliminating noisy label offsets.
To determine appropriate value of η, we introduce RegENN algorithm. RegENN is a noisy label sample selection method for regression problems proposed by Kordos [17] et al. The core idea of the RegENN algorithm is that for regression problems, the labels of similar samples should be similar. If the labels of similar instances of the sample to be detected are quite different from its label, the sample to be detected can be considered as a noisy-label instance. The RegENN algorithm can be expressed by pseudocode as the following Algorithm 1.

Algorithm 1: RegENN: Edited Nearest Neighbor for regression using a threshold
Data: Training set T = {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n )}, hyper parameter α to control how the threshold is calculated from the standard deviation, the number of neighbors k to train the model.
In Algorithm 1, we chose Manhattan distance as the metric distance for the nearest neighbor algorithm. Manhattan distance can be formulated as follows: According to the spectral analysis, we found that the reflectance spectral curves of different water bodies are similar in shape, and more manifested in different amplitudes. So, the SAM measurement method is not suitable. Compared to SAM-based distance measurements, Manhattan distance is more sensitive to spectral reflectance amplitudes. For the search of similar samples of the target sample, we use the K-nearest neighbor method.
However, in the original algorithm, the determination of α is actually a problem. When the value of α is too large, fewer samples will be determined as outliers, and when the value of α is too small, more samples will be determined as outliers, resulting in data distortion. Combining Equation (7) and RegENN, the water quality parameter regression problem can be written as: where f I (x) is the model trained after removing noisy label samples, l(y, f I (x)) is model fitting error after removing noisy label samples, η(α) is a function of the number of noisy samples related to α, g(η(α)) is a loss function to detect the number of noisy samples, t is a weight parameter. Our objective is to not only ensure the model fit after noisy label instance selection, but also impose certain constraints on the number of detected samples to ensure data fidelity. For the specific mathematical form of l(y, f I (x)) and g(η(α)), we will discuss in the Section 3.1 below. For the setting of the value of α, we use the grid search method. Empirical values of α given in previous studies ranged from α ∈ [0. 5,5]. Within this range, we adopt the grid search method, combined with Equation (9), and determine value of α.

Water Quality Parameter Inversion
We use PLSR, RFR, KNN, Adaboost, 1DCNN-based algorithms to retrieve three water quality parameters (chromaticity, turbidity and COD). As methods are intelligent algorithms, we take all the bands as input and test the performance of different algorithms. The PLSR, RFR, KNN, Adaboost algorithms use functions included in the python programming library scikit-learn.
At the same time, we build a 1DCNN model for further improving the fitting performance. Due to the small number of samples and the large number of spectral bands, in order to better allow the model to learn better features, we introduce the Self Attention module. Self attention mechanism (SAM) is proposed by [38]. Attention mechanism can make a network learn a weight vector that indicates the importance of different features which improves network performance. It can be divided into two categories, spatial attention [39] and channel attention [40]. For hyperspectral data, we adopt channel attention to allow the model to learn better features.
As for loss function, we adopt Smooth L1 loss: For the regression problem, the Smooth L1 loss not only solves the problem of the gradient explosion caused by the L2 loss for outliers due to the large difference, but also solves the disadvantage that the L1 loss is not smooth enough in the [−1, 1] interval. The 1DCNN network structure we constructed is shown in the Table 2. The retrieving accuracy of water quality parameters can be evaluated by the correlation coefficient r, coefficient of determination (R2), MAE, and accuracy (Acc). The formulas of these three evaluation indicators are as follows: where y i is label value, y is mean of the label value, f i is predicted value, N is the number of samples, N a is the number of samples with PE less than 20% [41].

Experiments and Results
In this section, we will describe and discuss the experimental results in detail. In Section 3.1, we describe in detail the algorithm parameter settings adopted during the experiments. In Section 3.2, the characteristics of the reflectance spectral curves will be analyzed. At the same time, we compared the spectral correlation differences before and after instance selection. In Section 3.3, the comparison results of the original data set and denoised data set with different inversion algorithms will be shown in the following table. In Section 3.4, for the selection of the threshold parameter α, we conduct experiments on three water quality parameters using the random forest method.

Experiment Settings
In the process of detecting noisy label data, when the k-neighbor algorithm detects similar spectral samples, following the experienced value proposed by RegENN, the value of k can be set from 4 to 8. In the experiment, we set k to six, that is, using the Manhattan distance as a metric, and select six adjacent samples. A grid search method [42] was used to determine the hyperparameter α, searching from 1 to 30 with a search step of 0.1. The value of the hyperparameter α will vary according to different regression models, and the specific range is shown in the Table 3. For the selection of l(x), we combine MAE and R2 loss functions, and it can be formulated as l(x) = 1 − R 2 + MAE. Meanwhile, we define g(x) = x 2 . This will ensure that when the number of noisy instances selected is too large, the loss will increase rapidly, thus limiting the range of η(α). The loss weight parameter t is set to t ∈ [0.05, 0.1].
The parameter settings of the water quality inversion algorithm are described as follows. The PLSR algorithm uses the five most correlated bands for inversion. The n_estimator and max_depth are two common parameters in RFR and Adaboost algorithms. For the random forest regression algorithm, we set the parameter n_estimator to 100 and the max_depth to 10. For the Adaboost algorithm, we set n_estimator to 50. The number of neighbors in the KNN algorithm is set to six. For our constructed 1DCNN network, the training epoch is set to 800 and the learning rate is set to 0.001.

Spectral Characteristic Analysis
To inverse water quality parameters, there are several methods such as band math method and intelligent method. For the band math method, it is important to find the most correlated bands to formulate. To evaluate the correlation of different bands, correlation coefficient is a statistical indicator that reflects the relationship between variables. The promotion of coefficient makes it easier to find suitable spectral bands to formulate. Figure 4 shows band correlation curves for three water quality parameters. As shown in the figure, for the turbidity variable, there is a strong correlation between the blue light band in the range of 400-500 nm and the near-infrared band in the range of 720-850 nm, which is consistent with the prior knowledge [1,8,10]. For the chromaticity variable, there is a strong correlation in the visible light band in the range of 400-800 nm. For COD variable, there is no significant band correlation for the overall spectrum, as expected. Because COD has no obvious optical signature on the reflectance spectrum, the underlying relationship needs to be further explored [10,43]. After the process of instance selection, it can be found that the band correlation of turbidity has been significantly improved, while the band correlation of chromaticity and COD has not been significantly improved. It can be concluded from the experimental results of spectral correlation that there is a more obvious label noise effect on the turbidity data. This experimental result verifies our idea to some extent that turbidity is more prone to introduce noise due to its characteristics.

Water Quality Parameter Retrieving Results
In the previous section, we analyzed the band correlation of the reflectance spectrum, but only illustrated the improvement in the linear relationship of the data. In this section, we retrieve three water quality parameters using several currently popular intelligent regression methods. In the experiments, we used the divided 28 samples for training model and eight samples for testing. The experimental results are shown in the Tables 4-6.   Table 4 shows the results of turbidity inversion using the above method. From the results, it can be found that our constructed 1DCNN network achieves the best results. The model was trained on the original data set and the denoised data set, and achieved R2 of 0.873 and 0.904 on the test set, and MAE of 0.114 and 0.084, respectively. After denoising of the training set, except for the PLSR method, other methods have obvious improvements in the test set R2 and MAE. Among them, the KNN method improved from 0.335 to 0.634 on the test set R2, the largest improvement. However, due to the poor fitting performance of the KNN regression method, the R2 on training set only increased from 0.321 to 0.461, and did not achieve good fitting performance. Except for the KNN method, the RFR method achieves the largest performance improvement after denoising the training set. The RFR method improves from 0.73 to 0.844 on the test set R2 and from 0.208 to 0.144 on the test set MAE. Parameter n represents the number of noisy label instances detected when the model achieves the best performance. For the turbidity data, the number of noisy instances detected varies from five to eight, depending more on the regression model used. Table 5 shows the results of different methods for chroma inversion. The 1DCNN method achieves the best performance on the test set. On the original dataset, the R2 and MAE on test set are 0.834 and 0.096. After denoising on the training set, the R2 and MAE of the 1DCNN method on the test set are 0.877 and 0.093, respectively. After the denoising of the training set, the R2 and MAE of these methods on the training set have a certain improvement, but the R2 and MAE on the test set are not significantly improved or even have a certain decline. Furthermore, after the denoising of the training set, the R2 of the 1DCNN method is improved from 0.834 to 0.877. The KNN method has improved on the training set, but the performance on the test set has decreased. The number of detected noisy instances varies from one to three, significantly smaller than the value of n for the turbidity data. We assume that there is less noise in the chroma data, and removing the detected outlier samples will significantly improve the performance on the training set. However, since these outliers are difficult to learn samples, not noisy samples, the model will learn an incomplete feature extraction, so the improvement of R2 and MAE is not obvious or even decreased on the test set. Table 6 shows the results of different methods for COD inversion. For retrieving COD, although the R2 of 1DCNN on the training set is very high, the R2 performance on the test set is not as good as other intelligent methods, whether the training set has been denoised or not. We suppose this is because the neural network model is not able to learn the correct feature representation. Since the relationship between COD and reflectance spectrum is not as obvious as turbidity and chromaticity, the limited amount of data makes the neural network unable to learn effective feature representation. Furthermore, Adaboost achieves the best performance in inversion of COD experiment. On the original dataset, the Adaboost method achieves R2 and MAE of 0.574 and 0.089 on the test set, respectively. After denoising on the training set, R2 and MAE of test set reach 0.662 and 0.078, respectively. Overall, the model cannot perform as well on the test set as chromaticity and turbidity, since COD does not have a particularly distinct feature on the reflectance spectrum. It should be noted that due to the poor fitting performance of KNN, and even an expected fitting performance cannot be obtained on the training set, subsequent experiments were not carried out. In the COD inversion experiment, the Adaboost and 1DCNN methods have significantly improved R2 on the test set after denoising. However, after the instance selection, the fitting ability of other methods on training set is reduced. For RFR and PLSR methods, after the instance selection, R2 on the training set drops from 0.847 to 0.817, and from 0.634 to 0.603, respectively. The model did not fit the data well in the training set, and the results on the test set are definitely not as expected. Besides, the number of detected noisy instances varies from one to two, which is less than the number of noisy instances detected on turbidity data.
According to the Tables 4-6, we found that when the number of selected noisy instances is between five and eight, the model for turbidity inversion will have a more obvious performance improvement. The turbidity results show that it is easier to introduce label noise into turbidity data for flowing water body. When the number of selected instances is two to three, the inversion chroma model will have a certain performance improvement, but the magnitude is not large, which indicates that less label noise is introduced into chromaticity data for flowing water body. However, after the instance selection of COD, the performance is not significantly improved or even dropped. It could be concluded from the results that the value of turbidity changes quicker than COD in the flowing water body and it accords with our actual experience while measuring. For turbidity and chromaticity parameters, our methods obtain expected performance improvement.

Parameter Experiment
The value of parameter α is directly related to the number of labels that are judged to be noisy, thus affecting the model results. In order to study the influence of the value of parameter α on the performance of the model, we uniformly use the RFR method to conduct parameter adjustment experiments for three water quality parameters. In the experiment, we used the divided 28 samples for training model and eight samples for testing. The final results are shown in Figure 5. The horizontal axis represents the value of the parameter α, the vertical main axis represents the evaluation index such as R2 and MAE on testing data, and the vertical secondary axis represents the number of detected noisy instances n. As shown in Figure 5a,b, in the experiment of retrieving turbidity, as the value of α gradually decreases, the number of instances n that are identified to be noisy labels gradually increases. When n goes from two to five, the model performance on the test set gets promoted, and when n continues to increase, the model performance starts to drop. We believe that when n is less than five, with the elimination of abnormal outliers, the model gradually learns the correct mapping. When n is greater than five, we cannot guarantee the fidelity of the data due to eliminating too many samples, and the model performance starts to degrade. Figure 5c,d represents, respectively, the R2 and MAE results of the model on the test set in the chroma inversion experiment. It can be seen from the figure that when n is less than three, the model performance has a certain improvement. When n is greater than three, the model performance gradually degrades. We can conclude from the results that the noisy-label contamination of chroma is not as severe as that of turbidity, and when gradually starting to remove more outliers, the model fails to learn a complete mapping, resulting in performance decline.
Figure 5e,f represents the R2 and MAE results of the model on the test set in the COD inversion experiment, respectively. It can be seen from the figure that when n is less than four, the performance of the model decreases slowly, and when n is greater than four, the performance of the model on the test set degrades severely. We suppose that this is because the outliers in the COD data carry important feature information rather than polluted noise information. Model performance drops when outliers that carry important information are gradually removed.

Discussion
In this section, we want to discuss about three issues we met in the experiment. The first one is the invalid pixels in the experiment. The second issue is the suitable time for UAV imaging. The third one is the limitations of UAV considering meteorological conditions and practical operations.
In Section 2.1.1, we introduce that 40 sampling points are arranged. However, four sampling points are removed because of shadow of trees. The invalid pixels in the experiment are presented in Figure 6. It is worth noting that when sampling, the shadow did not cover the site. When the UAV flew by, the shadow changed, and covered the sampling site. While sampling, the site may not be too close to trees or other tall buildings. Furthermore, in previous trial, the sunlight reflection will also affect the data quality. Suitable time for UAV imaging needs to be considered to avoid sunlight reflection. It is not suggested to fly UAV at noon, especially it is sunny, the sunlight may be perpendicular to the surface of water. The images obtained at noon is shown in Figure 7. The light is intense so image is over-exposed, useful information will be lost and the value of pixels reaches the maximum. Severe meteorological conditions such as strong winds and precipitation even make UAV unable to take off. Another issue worth noting is battery. The UAV in the experiment is DJI Wind 4. The theoretical flying time is about 25 minutes. The maximum flying speed is 14 m/s. However, in practical, UAV ascends to specified altitude and lands on ground will consume 30 percent electricity. So, the actual time for UAV imaging is limited. This characteristic hinders the development of UAV based method to a wider range of detection.

Conclusions
In this study, a problem is raised that when using the existing UAV method to retrieve the flowing water body, the retrieving performance of water quality parameters such as turbidity and suspended solids changes quickly. As it is difficult to synchronize the time when collecting water quality data, it is easy to introduce label noise and affect the model performance. Based on this problem, this paper proposes an enhanced RegENN instance selection scheme which can more accurately detect noisy label instances and the scheme has better performance on turbidity and chromaticity data. This is because the value of turbidity and chromaticity changes more quickly in a flowing water body. At the same time, in order to further improve the model feature extraction and fitting performance, this paper combines 1DCNN and Self-Attention mechanism to build a deep neural network model, which achieves the best performance in chromaticity and turbidity inversion experiments. The values of testing R 2 on turbidity and chromaticity are 0.904 and 0.877 respectively. Furthermore, we conduct a detailed experimental analysis of the value of the parameter α. The parameter α determines the number of samples which are judged to be label noise. The experimental results show that when the UAV water quality parameter inversion is performed on the flowing water body, the parameters such as turbidity and chromaticity need to be processed by noise labels. Our proposed enhanced RegENN method is proved to be effective on these two water quality parameters.
In the future work, we will conduct experiments on different water bodies to further verify the applicability of this method. At the same time, we will collect more water quality parameters to further analyze the measurement stability of other parameters.