Retrieval of Water Quality from UAV-Borne Hyperspectral Imagery: A Comparative Study of Machine Learning Algorithms

: The rapidly increasing world population and human activities accelerate the crisis of the limited freshwater resources. Water quality must be monitored for the sustainability of freshwater resources. Unmanned aerial vehicle (UAV)-borne hyperspectral data can capture ﬁne features of water bodies, which have been widely used for monitoring water quality. In this study, nine machine learning algorithms are systematically evaluated for the inversion of water quality parameters including chlorophyll-a (Chl-a) and suspended solids (SS) with UAV-borne hyperspectral data. In comparing the experimental results of the machine learning model on the water quality parameters, we can observe that the prediction performance of the Catboost regression (CBR) model is the best. However, the prediction performances of the Multi-layer Perceptron regression (MLPR) and Elastic net (EN) models are very unsatisfactory, indicating that the MLPR and EN models are not suitable for the inversion of water quality parameters. In addition, the water quality distribution map is generated, which can be used to identify polluted areas of water bodies.


Introduction
Inland water is the most significant freshwater resource for humans. It has various functions, including water storage, irrigation, and power generation. Inland water bodies are usually close to human settlements. They are easily subject to the combined pressures caused by intensive human activities and environmental changes. Pollution, such as agricultural activities, aquaculture, and industrial discharge, will lead to the accumulation of nutrients in the water and will cause eutrophication [1]. The eutrophication will further lead to the occurrence of algal blooms [2]. These destroy the aquatic ecological structure and consume a large amount of dissolved oxygen, leading to hypoxia and cause the death of aquatic animals and plants. Therefore, for the sustainable development of water resources, water quality must be monitored by statutes on a chemical, physical, and biological basis, according to the Environmental Quality Standards for Surface Water (GB3838-2000) in China [3]. Water quality parameters are the most commonly used evaluation measurements to characterize the water quality of inland water bodies.
Chlorophyll-a (Chl-a) concentration and Suspended solids (SS) concentration are the most common water quality parameters. Chl-a is a typical optical active parameter widely existing in algae and cyanobacteria, as well as in other aquatic plants. The concentration of study are (1) to compare and analyze the prediction performance of different machine learning models, including CBR, Adaboost regression (ABR), extreme boost regression (XGBR), random forest (RF), extremely randomized trees (ERT), support vector regression (SVR), MLPR, and EN, using the evaluation index including R 2 , RMSE, and MAE; (2) to use typical water quality parameters, including Chl-a and SS, to evaluate the machine learning model; and (3) to verify the potential of machine learning models combined with UAV-borne hyperspectral data in water quality mapping.

Study Area
The study area was Beigong Reservoir (114 • 21 14.15" E, 30 • 35 5.52" N) with a catchment area of 6.8 km 2 and a storage capacity of 12.2 × 10 6 m 3 , which is the primary supply source for the tributary of the Liu River, namely the Pearl River basin of the Xijiang river system. Beigong Reservoir is located in the Beigong village southwest of Liuzhou, Guangxi Zhuang Autonomous Region, China. Additionally, Liuzhou is an essential industrial city in Guangxi and is key to the "three wastes" emissions of exhaust gas, wastewater, and waste residue. Moreover, the Liuzhou Municipal Government attaches great importance to the local water pollution problem. Beigong Reservoir is a medium-sized reservoir, utilized for irrigation, flood control, and power generation. It is a famous tourist attraction with a beautiful scenery surrounded by mountains that integrates entertainment, tourism, and life functions. Hence, the Beigong Reservoir is of great significance for the residents and policymakers.

Data Collection
Liuzhou's summers are long, hot, sultry, humid, and cloudy, and the winters are short, cold, and mostly sunny. During the year, the temperature usually varies between 6 • C and 33 • C, and is rarely below 2 • C or above 36 • C. Therefore, the best time to conduct a water quality sampling survey is from the last ten-day period of September to October. Therefore, field investigations were uniformly conducted in Beigong Reservoir from 9 to 10 September 2018. On the basis of the field data collection regulations, 33 sample points at Beigong Reservoir were collected for Chl-a and SS inversion. The field sampling data were analyzed in the laboratory. The statistical information of the water sampling data is shown in Table 1. Detailed information about the Beigong Reservoir is presented in Figure  1, in which the water quality sampling points, ground water surface spectrum point, the UAV-flight routes, and the obtained UAV-borne flight strips are shown.  The ground water surface spectral reflectance data will be discussed next. The ground water surface spectrum was collected by the ASD FieldSpec 3 field-portable spectrometer with a wavelength range of 350-2500 nm. The spectrometer was provided by the China University of Geosciences (Wuhan, China). The measurement of the ground water surface spectrum was based on the "above-water surface method" [20]. The reference board with a reflectivity of nearly 1 was used for radiometric calibration. In windless weather, the water surface was flat. The water surface spectrum, sky spectrum, and synchronously the spectral data of the reference board were collected. The reference board was utilized to perform the calibration of the water surface spectrum and to obtain the water-leaving reflectance data. The ground water surface spectral reflectance collection was repeated three times in situ and the average value of the three collected data was used as the final reflectance data. The total radiance received by the spectrometer can be expressed as: where L_sw is the total radiance received by the spectrometer. L w is the water-leaving radiance. L sky is the diffuse radiance of the sky. r is the air-water interface reflectance rate. When the water surface is flat, r can be set as 0.022; when the wind speed is about 5 m/s, r can be set as 0.025; and when the wind speed is about 10 m/s, r can be set as 0.026-0.028. L w can be expressed as follows [21]: where E d (0 + ) is the incident total irradiance. L p is the 100% converted value of the reference board. R rs is the water-leaving reflectance. The UAV-borne hyperspectral data will be discussed next. We adopted the six-rotor DJ M600 Pro UAV as the airborne platform and the sensor installed on it was the Headwall NANO-Hyperspec manufactured by Headwall Photonics Lnc. The spectral resolution was 6.0 nm [22,23]. The resampling interval was set to 2.2 nm, which is the sensor parameter. At the UAV flight process, the field of view was 16 • and the flying height of the UAV was 400 m with a real-time wind speed of 5.2 m/s. According to the area of the reservoir, 10 routes of flight have been designed, where the along-track overlap was 80% and the side overlap was 60% [24]. With 270 spectral bands in the range of 400-1000 nm, the spatial resolution of the hyperspectral imagery was 0.173 m/pixel. Due to the low flight altitude, atmospheric influence can be ignored [25]. The UAV-borne hyperspectral image preprocessing was conducted, including water body extraction, sensor calibration, geometric correction, and in situ radiation correction. Firstly, the normalized difference water index (NDWI) was used to extract the water information in the UAV-borne image [26][27][28]. Secondly, we performed geometric correction for the image. The NANO-Hyperspec hyperspectral imaging spectrometer has a global positioning system and an inertial measurement unit (GPS/IMU) navigation system that can contribute to geometrically correcting the image by recording the position and attitude information of the spectrometer. Thirdly, regarding the calibration of the sensors, the signal output by the sensor unit was converted into the actual radiation intensity value. Finally, by constructing the linear relationship between the pixel spectrum of the UAV hyperspectral image and the ground water spectrum, the in situ radiation calibration was performed [29]. The water quality sampling data, the ground water surface spectral reflectance data, and UAV-borne hyperspectral data were all collected at the same time. ABR is a typical boosting algorithm introduced by Freund [30]. ABR trains the weak learners and then integrates the trained weak learners to obtain a final model [31]. ABR assigns different weights to each sample according to the prediction error rate of the learner, then adjusts the weight of the sample, and finally accumulates and weights the prediction results of all learners to generate a predicted value.

Gradient Boost Regression tree (GBRT)
GBRT is a machine learning algorithm based on ensemble decision trees [32], which is the regression form of gradient boost decision trees (GBDT). The GBRT model first builds a regression tree with equal weights based on the original data. It evaluates the prediction results by minimizing the square error. The smaller the mean square error, the lower the weight of the decision tree. The GBRT model uses the negative gradient of the loss function in the current model to approximate the residual between the current model's predicted value and the observed value, so that the model optimizes the weight of the regression tree along the direction of the negative gradient of the loss function. In each round of the training process, the model reduces the loss function and accelerates the convergence to reach the local optimal solution or the global optimal solution. Through continuous iteration, the predicted values of all the regression trees are combined to obtain the final prediction result.

Extreme Gradient Boosting Regression (XGBR)
XGBR is an improved decision tree algorithm based on the GBDT algorithm [33]. The core of the algorithm is to continuously add and train new decision trees to fit the residuals of the previous iteration [34] and the prediction values of all the decision trees are accumulated to obtain the final prediction result. XGBR improves prediction performance by reducing model bias. Compared with the traditional GBDT algorithm, XGBR modifies the objective function of the GBDT algorithm. The formula of the loss function is defined as follows: where g i and h i are the first and second partial derivatives of the loss function.ŷ i is the predicted value of the model, while y i is the observed value. f t (x i ) represents the score of the i-th sample in the t-th decision tree, Ω( f t ) is the regular term of the model, l represents the number of trees, γ represents the complexity of the leaves, T represents the number of the leaves,λ represents the scaler factor, and w n represents the weight of the n-th leaf node in the tree. After removing the constant term, the final objective function can be expressed as: where G 2 n H n +λ represents the contribution of each leaf node to the current model loss function. XGBR uses a greedy algorithm to traverse all split leaf nodes in the model. When the gain of the target after the split is less than the set threshold, we can ignore the split.
H R +λ represents the right subtree score, and H L +H R +λ represents the score when it is not divided.

Catboost Regression (CBR)
CBR is a new decision tree based on a gradient boosting frame [35] and uses oblivious decision trees as a based learner. Oblivious trees use the same criteria for splitting at each level of the tree. Each leaf index can be encoded as a binary vector whose length is equal to the depth of the tree, which helps to avoid overfitting and speed up the prediction of the model. CBR differs from the traditional GBDT algorithm in three aspects: (1) CBR can efficiently process categorical features. The categorical feature refers to a category or label. Unlike other numerical characteristics, the numerical variables of categorical characteristics cannot be compared with other numerical variables. Therefore, they are also called nonordered features. Discrete numbers are also categorical features. In the traditional GBDT algorithm, when the structure and distribution of the training data set and the test data set are different, the conditional offset problem will appear. Furthermore, CBR uses the improved greedy target statistics method to add prior distribution items to reduce the influence of noise and low-frequency data on the data distribution. For regression, prior items can take the mean of the data set label. (2) When CBR constructs a new split node for the current tree, it uses a greedy method to consider all combinations which combine different types of features into new features and dynamically transform the new composite categorical features into numeric features. (3) CBR replaces the gradient estimation method in the traditional algorithm by ordered boosting, which helps to overcome the prediction shift caused by gradient bias.

Random Forest (RF)
RF uses the bootstrap method to randomly select n samples from the original data to construct a decision tree. Each sample has M attributes. In the node split of the decision tree, m attributes are randomly selected from the M attributes using the information gain method, where in the attribute with the largest gain is selected as the best split attribute of the node. Then, the prediction results of multiple decision trees are averaged to obtain the final prediction result [36].

Extremely Randomized Trees (ERT)
The structure of the ERT [37] is similar to the RF. The difference between the ERT and RF is that the ERT uses all the samples to construct a decision tree in the training process. For node splitting, the RF algorithm selects the best attribute split, while the ERT randomly selects the attribute split [38], which results in the size of the generated decision tree being larger than that generated by the RF model. Therefore, the variance of the ERT model is reduced compared to the RF model.

Support Vector Machine (SVM)
SVR is a kernel-based algorithm that improves the model's generalization ability by seeking the minimum structured risk and realizing the experience risk minimization. SVR can obtain good prediction results with a small sample size [39].

Multi-Layer Perceptron Regression (MLPR)
MLPR is the most commonly used artificial neural network model, which is composed of three types of layers: an input layer, an output layer, and one or more hidden layers with activation functions [40]. It uses a subset of the training set to adjust the weight and biases on each node of layers. MLPR takes input data, multiplies them with weights, and then inputs them into the activation function to produce final results. MLPR can obtain non-linear relationships and real-time learning. However, MLPR requires many hyperparameters to be adjusted, which is time-consuming.

Elastic Net (EN)
EN is a mixture of the Lasso regression (LR) and the Ridge regression (RR) [41], and the optimization objective function of elastic net regression is defined as follows: where · 2 represents the L2 norm and · 1 represents the L1 norm. The EN regression penalty function uses the convex combination of the L1 norm and L2 norm, which is equally the convex combination of the RR penalty function and LR penalty function. Therefore, the EN has the advantages of both RR and LR, which not only achieves the purpose of variable selection but also improves the stability of the model. It automatically selects variables and retains important features, as well as eliminates irrelevant features.

Model Evaluation
In this study, we evaluate the models' performance via three indicators, including the coefficient of determination (R 2 ), root mean square errors (RMSE) and mean absolute error (MAE). These indicator metrics can be calculated as follows: where y i is the observed value, y i is the average of the observed value, and N is the number of valid samples used for the evaluation.ŷ i is the predicted value. The value of R 2 ranges from 0 to 1. An R 2 score of 1 indicates perfect precision, while a score of 0 indicates that the model has the worst prediction performance. The value range of RMSE is (0,+∞). If the dispersion of the predicted value of the model is high, the RMSE will be enlarged. MAE is the mean of the absolute value of the error between the predicted value and the observed value. The value range of MAE is (0,+∞). Model with high R 2 , low RMSE, and low MAE is deemed as a suitable model for quantitative inversion. The data set was divided into training and validation sets using random split sampling. In total, 80% of the inputting data was used for training the model and 20% of the inputting data was used to assess the prediction accuracy of the model. In this study, all the above model operations were based on the anaconda platform and the modeling of water quality parameters with the ABR, GBRT, RF, ERT, SVM, MLPR, and EN algorithms was implemented with the scikit-learn 0.23.2 machine learning library on the anaconda platform. The XGBR and CBR algorithms were implemented by the xgboost and catboost libraries, respectively.

Spectral Analysis
The spectral signature of Chl-a is characterized by strong absorption in the blue (443 nm) and red wavelengths (near 675 nm), and by high reflectance in the green (550-555 nm) and red-edge spectrum regions (685-710 nm) [42][43][44][45]. Existing studies have shown that the most suitable spectral range for monitoring suspended solids in water is 700-800nm [46]. Therefore, the 181 bands with 400 to 800 nm were used for the determination of water constituents in this study. The band ratio method can eliminate background noise and rough water surface interference, and can enhance the fine spectrum characteristics that are beneficial for water quality parameter estimation [47]. This study used the band ratio method to preprocess the original spectrum. The exhaustive method was used to calculate the ratio of the bands. To identify and select the major wavelengths for the estimation of water quality parameters' concentration, we conducted the Pearson correlation analysis.
The spectral curves of the 33 samples in the preprocessed UAV-borne hyperspectral remote sensing imagery are shown in Figure 2. The correlation coefficient between the raw spectral data and the Chl-a concentration value was negative, ranging from 0 to −0.7952. The 84 spectral absolute correlation coefficients were greater than 0.7, which were mainly concentrated in the 400-590 nm wavelength range. The correlation coefficient between the raw spectral data and the SS concentration values ranged from 0.1755 to 0.7685. The 46 spectral correlation coefficients were greater than 0.7, which were mainly concentrated in the 700~800 nm wavelength range. This study calculated the ratio of 181 bands of the original spectra by using the exhaustive method and obtained 32580 characteristic variables. The correlation coeffi-cients between the characteristic variables and the Chl-a concentration values ranged from −0.8196 to 0.8243. There were 12 ratio variable correlation coefficients greater than 0.81. These variables' spectra were mainly concentrated in the 400~480 nm wavelength range. The band ratio preprocessing method improves the absorption valley characteristics of the original Chl-a spectral. The correlation of processed spectra is significantly improved compared to that of the original spectra. The correlation coefficients between the characteristic variables and the SS concentration values ranged from −0.7653 to 0.7823. There were 43 ratio variable correlation coefficients greater than 0.76. These variables' spectra were mainly concentrated in the 592~606 nm wavelength range. When compared to the raw spectra, the correlation of the processed spectra is considerably enhanced.

Hyperparameters for the Machine Learning Algorithms
The model performance is influenced by its hyperparameters when estimating the concentration of water quality parameters. Tuning the hyperparameters is a critical step before the quantitative inversion. The key adjusting hyperparameters and their optimal parameter values for each model are shown in Tables 2 and 3.  The number and types of hyperparameters in each model are different. Selecting key parameters of the model can reduce the training time of the model and improve the prediction efficiency of the model. Since CBR, ABR, GBRT, XGBR, RF, and ERT are all tree-based models, the number of trees seriously affects the performance of the model. Too many trees will cause over-fitting of the model and an insufficient number of trees will cause underfitting. The L 2 regularization can control the complexity of the model and reduce the generalization error. The GBRT model optimizes the subsample, which controls the random sampling ratio of each tree. If the subsample value is set too small, the result will be underfitting, thus the subsample value range was between 0.5 and 1. For the ERT model, the tree's depth is tuned. The tree's depth determines how the model learns the characteristics of individual samples. The more individual sample features are learned, the worse the generalization ability of the model. In the SVR model, the default radial basis function was selected as the kernel function, tuning the parameters of C and gamma, which control the trade-off between the slack variable penalty and the marginal width. For the MLPR model, we tuned the parameters of the hidden layer size, which include the number of hidden layers and the number of perceptrons contained in each hidden layer.

Retrieval Results for Chl-a
The retrieval results of each model for Chl-a are shown in Table 4. We can observe that the CBR model had the best prediction performance (R 2 = 0.96, MAE = 0.53 mg/m 3 , RMSE = 0.96 mg/m 3 ). The prediction accuracy for the CBR validation data set (R 2 = 0.96, MAE = 0.53 mg/m 3 , RMSE = 0.96 mg/ m 3 ) was lower than the prediction accuracy for its training data set (R 2 = 1.00, MAE = 0.09 mg/ m 3 , RMSE = 0.12 mg/m 3 ). For the XGBR validation data set, the prediction accuracy for the XGBR model (R 2  . The prediction performance of SVR, MLPR, and EN was too poor for the inversion of the concentration of Chl-a. Since these models, namely SVR, MLPR, and EN, have low R 2 with high MAE and RMSE, it is determined that they are not suitable for Chl-a inversion. Scatter plots of the observed and predicted values of the nine machine learning algorithms are presented in Figure 3. The predicted and observed values of the models were evenly distributed on both sides of the regression line, indicating that the models' prediction accuracies are excellent. The difference between the predicted values and the observed values represents the level of the model's prediction deviation, indicating that the model may be overfitting or underfitting. From Figure 3, we can observe that the predicted values of the tree-based ensemble models, including CBR, ABR, XGBR, GBRT, RF, and ERT, were close to the regression line. Among them, the CBR, XGBR, and GBRT models have the highest prediction accuracy. The difference between the observed and the predicted values of the SVR, MLPR, and EN models is large, indicating that these models' prediction accuracies are extremely poor.

Retrieval Results for SS
The retrieval results of each model for SS are shown in Table 5. We can find that the CBR model had the best prediction performance (R 2   Scatter plots of the observed and predicted values of CBR, ABR, XGBR, GBRT, RF, ERT, SVR, MLPR, and EN are presented in Figure 4. The scatter plots show the relationship between the predicted value of each model and the observed value. A good prediction result will be evenly distributed on both sides of the regression line. We can observe that the tree-based models' (CBR, ABR, XGBR, GBRT, RF, ERT) predicted values and observed values were evenly distributed on both sides of the regression line, indicating that the predicted value of the model is very close to the observed value. The predicted value of the SVR model and the observed value were also evenly distributed on both sides of the regression line, and the accuracy of the SVR model was lower than that of the treebased model. Similarly, we can observe that there was a significant difference between the predicted value and the observed value of the MLPR and EN models, indicating that the prediction errors of the MLPR and EN models are relatively large.

Discussion
For a further discussion, the distribution map of Chl-a obtained by CBR model for the Beigong Reservoir hyperspectral imagery is shown in Figure 5. According to the statistics of the inversion results, the maximum value of the inversion result was 14.17 mg/m 3 the minimum value was 3.54 mg/m 3 . The observed value ranged from 2.62 mg/m 3 to 14.2 mg/m 3 . The inversion map reveals the spatial distribution of Chl-a in the Beigong Reservoir. The concentration of Chl-a was relatively high in the west part of Beigong Reservoir and mainly concentrated along the shore.
The distribution map of SS obtained by CBR model for the Beigong Reservoir hyperspectral imagery is shown in Figure 6. According to the statistics of the inversion results, the minimum value was 2.71 mg/L and the maximum value of the inversion result was 15.04 mg/L. The observed value ranged from 2 mg/L to 18 mg/L. The inversion map shows that the SS concentration in the southwest part of the Beigong Reservoir was significantly high compared to the whole of the reservoir and there was fragmented erythema near the reservoir's border, which may have been produced by transitory human activity.

Conclusions
The main purpose of this study is to compare the performance of various machine learning algorithms in predicting water quality parameters using UAV-borne hyperspectral data. Through this study, the main conclusions are as follows: 1.
The prediction performance of different machine learning algorithms, including CBR, XGBR, GBRT, ABR, ERT, RF, SVR, MLPR, and EN, in predicting water quality were compared. The overall prediction accuracy of the tree-based models were higher than that of the other three traditional machine learning models.

2.
Two water quality parameters, including Chl-a and SS, were analyzed with different machine learning models. For the prediction of Chl-a, the R 2 values of several models ranged from 0.58 to 0.96; the RMSE ranged from 0.53 to 1.75 mg/m 3 ; and the MAE value ranged from 0.47 to 1.6 mg/m 3 . Among them, the CBR model had the highest prediction accuracy and the XGBR model had the second-highest prediction accuracy. For the prediction of SS, the R 2 values of the nine models ranged from 0.59 to 0.94; the RMSE ranged from 1.2 to 2.97 mg/L; and the MAE value ranged from 1.11 to 2.46 mg/L. The prediction accuracy of the CBR model was the highest and the prediction accuracies of the XGBR and RF models were lower than that of the CBR. Notably, the CBR model showed stable and satisfactory performance for predicting water quality parameters, including Chl-a and SS. 3.
The water quality distribution map was generated based on the UAV-borne hyperspectral data and machine learning algorithms, which can be used for large-scale and continuous inland water quality monitoring. From the water quality parameter inversion map, we observed that the pollution degree of SS in the west part of Beigong Reservoir was much higher than that in the east part. The areas with the highest Chl-a concentration mainly existed in the southern part of Beigong Reservoir and near the shore area. The management can monitor the water quality from the inversion map, improving the efficiency of water quality maintenance and saving management costs.
To conclude, this study compared and analyzed the predictive performance of nine machine learning models on different water quality parameters. In future research, we will combine multi-temporal UAV-borne hyperspectral images to analyze the dynamic change of inland water quality.