Measurement of Total Nitrogen Concentration in Surface Water Using Hyperspectral Band Observation Method

: Nitrogen overload is one of the main reasons for the deterioration of surface water quality. Hence, monitoring nitrogen loadings is vital in maintaining good surface water quality. Increasingly, the use of spectral reﬂectance to monitor nitrogen concentration in water has shown potentials, but it poses some problems. Therefore, it is necessary to explore new methods of quantitative monitoring of nitrogen concentration in surface water. In this paper, hyperspectral data from surface water in the Ebinur Lake watershed are used to select sensitive bands using spectral transformation, the spectral index, and a coupling of these two methods. The particle swarm optimization support vector machine (PSO-SVM) model, constructed on the basis of sensitive bands, is used quantitatively to estimate the total nitrogen concentration in surface water and subsequently to verify its accuracy. The results show that the bands near 680, 850, and 940 nm can be used as sensitive bands for estimation of the total nitrogen concentration of surface water in arid regions. Compared with the best estimation models constructed by sensitive bands selected using the spectral transformation or the spectral index alone, the best model based on the coupling of these two measures is more accurate (R 2 = 0.604, Root Mean Square Error (RMSE) = 1.61 mg / L, Residual Prediction Deviation (RPD) = 2.002). This coupling method leads to a robust, accurate, and strong predictability model, and can contribute to improved quantitative estimation of water quality indexes of rivers in arid regions.


Introduction
Excessive nitrogen concentration is one of the main water pollution problems in Chinese rivers as well as rivers worldwide [1][2][3]. Excessive nitrogen in rivers feed algae causing it to grow faster than what most river ecosystems can handle. The result is the occurrence of eutrophication caused by large algal blooms which can consume almost all the oxygen in a river, producing so-called "dead zones" [4]. Rivers devoid of oxygen leads to hypoxia which kills virtually all aquatic organisms in the river, rendering the river to be declared as a "dead river". The main reason for the increase of the nitrogen concentration in inland waters is the intensification of industrial and agricultural activities, with their resulting effluents and pollutants discharging into waterways leading to severe ecological and environmental problems [5]. In China, it was reported that the Yangtze and Pearl river estuaries have been turned into dead zones, mostly due to nitrogen pollution of the waterways [6]. Most alarmingly, dangerous toxins are produced by harmful algal blooms (HABs) and these directly threaten human health [7]. Hence, it is of paramount importance that nitrogen pollution of surface waters be controlled to minimize all these problems.
Although satellite remote sensing can support extensive spatial and temporal monitoring, it has obvious disadvantages such as serious information redundancy, low spatial-temporal resolution and difficulty in quantitative inversion of water quality indexes in small rivers. Surface hyperspectral remote sensing technology can not only distinguish various spectral of water bodies to extract rich information but also have a controllable time resolution and high mobility, which can result in high-precision quantitative inversion of water quality indicators in small rivers [8]. However, the quantitative study of water quality indexes mainly focuses on the estimation of light sensitive variables, such as total suspended solids, chlorophyll-a and turbidity, while use of hyperspectral data to study other water quality indexes that are insensitive to light, such as total nitrogen, pH, and dissolved phosphorus, is rarely implemented [9]. Quantitative estimation methods for the total nitrogen concentration in surface water by hyperspectral data mainly include direct methods [10][11][12], indirect methods [13][14][15], and physicochemical methods [16][17][18].
To date, quantitative estimation of the total nitrogen concentration in surface water directly using hyperspectral data is still rare [19], especially in arid regions. Due to geographical factors, the water quality indices of surface water are often quite variable. Although various analysis and modelling methods are used in water quality index inversion and have achieved good results, they inevitably result in new errors when applied to new regions. In the arid regions, the applicability of the quantitative estimation model of total nitrogen concentration in surface water has yet to be verified due to the influence of special environmental conditions. However, the total nitrogen concentration of water is one of the most important water quality indicators. It has a profound impact on the water quality in arid regions, and it is also a challenging field for its high-precision estimation [20,21]. Therefore, it is necessary to develop a quantitative monitoring method suitable for arid regions, which is of practical significance for solving the ecological and environmental problems caused by nitrogen pollution.
The measured hyperspectral data can be used to estimate the total nitrogen concentration in water, but the hyperspectral data's information is always huge. If the original data is directly used in the calculation, it will cause the serious Hughes phenomenon [22]. Hence, the application of hyperspectral data is limited. Sensitive band selection is a common method to effectively reduce the dimension of hyperspectral data, which can significantly reduce the Hughes phenomenon. We summarize the most advanced band selection methods, which can be divided into six categories, i.e., searching based, clustering based, deep-learning-based, ranking based, sparsity based, and hybrid-scheme based methods [23,24]. These methods have many advantages, but there are also some defects. For example, although the first three methods are suitable for small samples, they usually have higher computational costs. The ranking based method is only suitable for large data sets, and the sparsity based method requires the imposing of sparse constraints. However, spectral transformation and spectral index can effectively enhance spectral information, which can be used in the selection of sensitive bands. In view of this, this paper attempts to couple spectral transformation and spectral index to achieve the purpose of optimizing sensitive bands on the basis of screening sensitive bands by spectral transformation and spectral index respectively.
In this study, the main three rivers in the Ebinur Lake watershed, located in XinJiang, China, were selected (Figure 1), because they are under the influence of large-scale agricultural and relatively concentrated industrial activities. The aims of this paper are to: (1) collect data on the hyperspectral reflectance of surface water and select sensitive bands based on spectral transformation and the spectral index, (2) attempt to couple spectral transformation and the spectral index to optimize the sensitive bands, (3) establish a PSO-SVM model based on the sensitive bands to quantitatively estimate the total nitrogen concentration of surface water, and (4) compare the inversion accuracy of the PSO-SVM model based on three methods for selecting sensitive bands.
Water 2020, 12, x 3 of 19 to quantitatively estimate the total nitrogen concentration of surface water, and (4) compare the inversion accuracy of the PSO-SVM model based on three methods for selecting sensitive bands.

Study Area
The Ebinur Lake watershed (43°38′-45°52′ N and 79°53′-85°02′ E) is located in northwest Xinjiang, China ( Figure 1). The study area is about 50,621 km 2 , comprising Bortala River Valley, Jinghe oasis, Wusu Oasis, Dandagai desert, and Mutetaer desert zone of the lower reaches of the Akeqisu-Kuitun River. The Ebinur Lake is located in the lowest elevation of the watershed and is the largest saltwater lake in Xinjiang. It has the characteristics of a typical lake in the arid region of Central Asia. The area experiences a typical arid continental climate in the middle temperate zone and is characterized by drought, low rainfall, drastic temperature variations, and severe soil salinization [25].
Three rivers, the Bortala, Jinghe, and Kuitun Rivers, flow into Ebinur Lake from the south, and west and east, respectively, and constitute the main water sources of the watershed. There are a lot of farmlands in the Bortala River, Jinghe River, and Kuitun River Sub watersheds, and some artificial reservoirs are constructed in the southwest of the Ebinur Lake. Over the past 40 years, due to the impact of human irrigation activities, the water inflow into the lake from these three rivers has decreased significantly. This has consequently lowered the water level of the lake, dropping it by 2-3 m, resulting in the reduction of the lake surface to less than 500 km 2 , and deterioration of the water environment of the watershed [26].

Measured Spectral Data and Preprocessing
Surface water hyperspectral data from the Bortala, Jinghe, and Kuitun Rivers were collected between October 12 and October 20, 2017. Based on the measured spectral curves, 3 groups of abnormal curves were eliminated, and 32 groups of hyperspectral curves were retained. (c) landscape of water quality in the upper reaches of Jinghe River; and (d) landscape of water quality in the lower reaches of Bortala River (photographed (c,d) by Changjiang Liu; map of (a,b) by ArcGIS10.3.1 (http://www.esri.com/software/arcgis)).

Study Area
The Ebinur Lake watershed (43 • 38 -45 • 52 N and 79 • 53 -85 • 02 E) is located in northwest Xinjiang, China ( Figure 1). The study area is about 50,621 km 2 , comprising Bortala River Valley, Jinghe oasis, Wusu Oasis, Dandagai desert, and Mutetaer desert zone of the lower reaches of the Akeqisu-Kuitun River. The Ebinur Lake is located in the lowest elevation of the watershed and is the largest saltwater lake in Xinjiang. It has the characteristics of a typical lake in the arid region of Central Asia. The area experiences a typical arid continental climate in the middle temperate zone and is characterized by drought, low rainfall, drastic temperature variations, and severe soil salinization [25].
Three rivers, the Bortala, Jinghe, and Kuitun Rivers, flow into Ebinur Lake from the south, and west and east, respectively, and constitute the main water sources of the watershed. There are a lot of farmlands in the Bortala River, Jinghe River, and Kuitun River Sub watersheds, and some artificial reservoirs are constructed in the southwest of the Ebinur Lake. Over the past 40 years, due to the impact of human irrigation activities, the water inflow into the lake from these three rivers has decreased significantly. This has consequently lowered the water level of the lake, dropping it by 2-3 m, resulting in the reduction of the lake surface to less than 500 km 2 , and deterioration of the water environment of the watershed [26].

Measured Spectral Data and Preprocessing
Surface water hyperspectral data from the Bortala, Jinghe, and Kuitun Rivers were collected between October 12 and October 20, 2017. Based on the measured spectral curves, 3 groups of abnormal curves were eliminated, and 32 groups of hyperspectral curves were retained.
A UV-VIS Spectro Photometer (UV-6100, mapada, Shanghai, China) was used to measure the total nitrogen concentration [27]. Before the test, instrument self-inspection and calibration were performed. During the test, the solution in the cuvette was 2/3-4/5 higher than the cuvette and kept clean. Three parallel water samples were collected from each sampling point to measure the total nitrogen concentration, and the mean value of the total nitrogen concentration of the three samples was taken as the measured value. The basic statistics of the test results are shown in the table in the upper right corner of Figure 2. In order to further describe the total nitrogen concentration of surface water in Ebinur Lake watershed, referring to the standard values of the III and V categories of water in the environmental quality standard for surface water (GB3838-2002), we divided the measured data into three levels: low nitrogen concentration (0-1 mg/L), medium nitrogen concentration (1-2 mg/L), and high nitrogen concentration (2-7.06 mg/L). Based on the measured results, there are 14 samples with low nitrogen concentration, 8 samples with medium nitrogen concentration, and 10 samples with high nitrogen concentration in the total 32 water samples. To ensure representative training and verification sets, 4 samples of low concentration, 2 samples of medium concentration, and 2 samples of high concentration were randomly selected from three concentration levels as verification set, and the rest were used as training set. A UV-VIS Spectro Photometer (UV-6100, mapada, Shanghai, China) was used to measure the total nitrogen concentration [27]. Before the test, instrument self-inspection and calibration were performed. During the test, the solution in the cuvette was 2/3-4/5 higher than the cuvette and kept clean. Three parallel water samples were collected from each sampling point to measure the total nitrogen concentration, and the mean value of the total nitrogen concentration of the three samples was taken as the measured value. The basic statistics of the test results are shown in the table in the upper right corner of Figure 2. In order to further describe the total nitrogen concentration of surface water in Ebinur Lake watershed, referring to the standard values of the Ⅲ and Ⅴ categories of water in the environmental quality standard for surface water (GB3838-2002), we divided the measured data into three levels: low nitrogen concentration (0-1 mg/L), medium nitrogen concentration (1-2 mg/L), and high nitrogen concentration (2-7.06 mg/L). Based on the measured results, there are 14 samples with low nitrogen concentration, 8 samples with medium nitrogen concentration, and 10 samples with high nitrogen concentration in the total 32 water samples. To ensure representative training and verification sets, 4 samples of low concentration, 2 samples of medium concentration, and 2 samples of high concentration were randomly selected from three concentration levels as verification set, and the rest were used as training set. An ASD fieldspec3 ground object spectrometer (Analytical Spectral Devices, Boulder, CO, USA) with a spectral range of 350-2500 nm was used to measure spectra [28]. Clear and windless weather was selected for field measurements, and the time of measurement time was 12:00-16:00 in the eastern six areas, facing the sun, to ensure that the sensor probe was vertical to the water surface. Before the test, white board correction was performed, and the mean value of 10 spectral curves collected from each water sample point was taken as the actual measured spectral curve for each point. To eliminate the influence of noise on the spectral data, the mean filtering method was used to uniformly denoise and smoothen the spectral curve [29]. The processed curves were taken as the characteristic spectral curves for each test water sample.
Nitrogen concentration is the main cause of eutrophication, and the degree of eutrophication is positively correlated with spectral reflectance [30,31], which is also substantiated by our measurement (Figure 2). Therefore, the water sample with lowest reflectance was selected to search for the sensitive band, which ensures the universality of the search method for the sensitive band to other water samples. An ASD fieldspec3 ground object spectrometer (Analytical Spectral Devices, Boulder, CO, USA) with a spectral range of 350-2500 nm was used to measure spectra [28]. Clear and windless weather was selected for field measurements, and the time of measurement time was 12:00-16:00 in the eastern six areas, facing the sun, to ensure that the sensor probe was vertical to the water surface. Before the test, white board correction was performed, and the mean value of 10 spectral curves collected from each water sample point was taken as the actual measured spectral curve for each point. To eliminate the influence of noise on the spectral data, the mean filtering method was used to uniformly denoise and smoothen the spectral curve [29]. The processed curves were taken as the characteristic spectral curves for each test water sample.
Nitrogen concentration is the main cause of eutrophication, and the degree of eutrophication is positively correlated with spectral reflectance [30,31], which is also substantiated by our measurement ( Figure 2). Therefore, the water sample with lowest reflectance was selected to search for the sensitive band, which ensures the universality of the search method for the sensitive band to other water samples.

Spectral Transformation
Spectral transformation is an important means of spectral sensitive band analysis. Appropriate spectral transformation can effectively enhance the prediction accuracy and robustness of the model. In this paper, 14 spectral transformations are used to search for sensitive bands related to total nitrogen concentration in surface water. (Table 1). Table 1. Spectral transformation forms.

Spectral Index Construction
The spectral index or improved spectral index are a common method for selecting a sensitive band for measuring a water quality index. Difference index (DI), ratio index (RI), and normalized difference index (NDI) are three kinds of remote sensing earth observation indices that are widely used and have high accuracy [32]. Based on these three index calculation formulas, the spectral difference index (SDI), spectral ratio index (SRI), and spectral normalized difference index (SNDI) were constructed to select sensitive bands from hyperspectral data ( Table 2). By analysing the quantitative relationship between the sensitive bands and the measured values for the water nitrogen concentration, the optimal spectral index was selected to screen the sensitive bands, establishing an inversion model to improve the estimation efficiency and accuracy of the total nitrogen concentration. Table 2. Constructed spectral index.

Spectral Index Calculation Formula Reference
Note: R i and R j are the original reflectivity of any two bands in the band range.

PSO-SVM Model
The particle swarm optimization (PSO) algorithm is a kind of grouping based on optimization technology. This algorithm has received wide attention by the academic community because of its easy implementation, fast convergence, high accuracy, and applicability to small samples. In this Water 2020, 12, 1842 6 of 18 study, the number of samples is small (32), which is the main reason for choosing this algorithm. The PSO algorithm regards every object to be optimized as a particle with an initial velocity and finds an optimal solution through iteration. After the PSO algorithm is initialized to a group of random particles, the corresponding random solution will be obtained, and then the optimal solution will be found through iteration. In each iteration, particles update themselves by tracking two extremum: the first is the optimal solution found by the particles themselves, which is called individual extremum; the other is the optimal solution found by the whole population present, which is the global extremum. In each iteration, particles update their speed and position through individual and group extremum, where w is the inertia weight, d is the space dimension, and i = 1,2...n, k is the current iteration number; x id , v id is the particle position and speed; p id is the particle individual optimal position, p gd is the global optimal solution; c 1 and c 2 are acceleration factors; and r 1 and r 2 are random numbers distributed in [0,1] interval. In the specific operation process, we set n = 32, c 1 = 2, c 2 = 2, w = 0.6, the maximum number of cycles as 500, and other super parameters as default value. Therefore, the PSO algorithm can be applied to the optimization of a support vector machine (SVM). That is, the PSO algorithm can be used to find the optimal parameters of a SVM-the penalty coefficient C and kernel function parameter g, and construct a PSO-SVM [35] estimation model.

Evaluation Indices
To quantify the performance of the model, the decision coefficient (R 2 ), mean square error (RMSE), and residual prediction deviation (RPD) were used to evaluate the inversion results. Similar studies have shown that this process is reliable [36,37]. The formulas are as follows: where y i is the measured value, Y i is the predicted value, T i is the average of the measured values of all samples, n is the number of soil samples, and SD is the standard deviation of the samples. R 2 reflects the robustness of the model establishment and prediction. The larger the value of R 2 , the more robust the model and the higher the fitting degree of the estimation model. The RMSE is used to evaluate the inversion ability of the model. The smaller the value of the RMSE, the stronger the inversion ability of the model. RPD is used to evaluate the prediction ability of the model. When RPD is less than 1.4 mg/L, the model is not predictive. When RPD is between 1.4 mg/L and 2 mg/L, the model can make approximate quantitative predictions. When RPD exceed 2.0 mg/L, the model can perfectly complete a quantitative prediction.

Selection of Sensitive Bands Based on Spectral Transformation
To improve the spectral sensitivity and highlight the mixed characteristic information of the spectral curve, 14 spectral transformations (Table 1) were carried out on the basis of the original spectral curve R, after denoising and smoothing. The partial correlation between the reflectance and the total nitrogen concentration was analysed to find the sensitive band. After screening, it was found that the first and second order differential forms of R 1/2 , 1/R, lgR, and lg(1/R) were significantly correlated with the measured total nitrogen concentration of surface water (Figure 3).

Selection of Sensitive Bands Based on Spectral Transformation
To improve the spectral sensitivity and highlight the mixed characteristic information of the spectral curve, 14 spectral transformations (Table 1) were carried out on the basis of the original spectral curve R, after denoising and smoothing. The partial correlation between the reflectance and the total nitrogen concentration was analysed to find the sensitive band. After screening, it was found that the first and second order differential forms of R 1/2 , 1/R, lgR, and lg(1/R) were significantly correlated with the measured total nitrogen concentration of surface water (Figure 3). The black line in the figure indicates the critical value of the significance of correlation coefficient when the significance level is p = 0.01. The point above the critical value indicates that the band reflectance had a significant positive correlation with the total nitrogen concentration of water body, and the point below the critical value indicates a significant negative correlation. Here, we retain two bands corresponding to the correlation coefficient with larger absolute value as the combination of sensitive bands. Thus, the sensitive band for estimating the total nitrogen  The black line in the figure indicates the critical value of the significance of correlation coefficient when the significance level is p = 0.01. The point above the critical value indicates that the band reflectance had a significant positive correlation with the total nitrogen concentration of water body, and the point below the critical value indicates a significant negative correlation. Here, we retain two bands corresponding to the correlation coefficient with larger absolute value as the combination of sensitive bands. Thus, the sensitive band for estimating the total nitrogen concentration of surface water was selected, as shown in Table 3. It can be seen that the absolute value of correlation coefficient varied from 0.54 to 0.8, and the spectral transformation form with better correlation was (R 1/2 ) and (lg(1/R)) .

Selection of the Sensitive Band Based on the Spectral Index
The spectral reflectance of water samples was arranged in a two-dimensional matrix sequence. The abscissa and ordinate axes were set as the spectral reflectance R i and R j , respectively, and the total nitrogen concentration in water samples was recorded as N. Matlab-R2012a software was used to construct a hyperspectral matrix coefficient map of the spectral reflectance and total nitrogen concentration in water (Figure 4) to evaluate the correlation between SDI, SRI, SNDI, and the total nitrogen concentration and to optimize the combination of sensitive bands significantly related to the total nitrogen concentration [38]. Different colours in the figures show the degree of correlation between the measured values of the total nitrogen concentration and the spectral index, the redder the color, the stronger the correlation. This kind of two-dimensional matrix map can accurately extract the effective peak wavelength with high correlation.
It can be seen from the hyperspectral matrix coefficient diagram that the three hyperspectral matrix coefficient diagrams (SDI, SRI, and SNDI) have similar distribution characteristics and are significantly related to the total nitrogen concentration of the water bodies under study. The sensitive bands and correlation coefficients related to the total nitrogen concentration of water bodies in different spectral indexes are shown in Table 4. We can see that the correlation coefficient based on the three spectral indices is relatively close, but the correlation coefficient based on SDI is the largest, with a value of 0.71.  It can be seen from the hyperspectral matrix coefficient diagram that the three hyperspectral matrix coefficient diagrams (SDI, SRI, and SNDI) have similar distribution characteristics and are significantly related to the total nitrogen concentration of the water bodies under study. The sensitive bands and correlation coefficients related to the total nitrogen concentration of water bodies in different spectral indexes are shown in Table 4. We can see that the correlation coefficient based on the three spectral indices is relatively close, but the correlation coefficient based on SDI is the largest, with a value of 0.71.

Estimation of the Total Nitrogen Concentration in Surface Water Based on Spectral Transformation and the Spectral Index
After the parameters C and g of the SVM were obtained by PSO, the spectral reflectance values of the sensitive bands of training and validation sets were taken as input variables and the measured total nitrogen concentration values of corresponding samples were taken as output variables. The total nitrogen concentration of each sample was obtained through an SVM machine learning algorithm and then its estimation accuracy was evaluated. According to Section 2.6, the larger of R 2 , RPD, and the smaller of RMSE, the higher of the model accuracy.

Estimation of the Total Nitrogen Concentration in Surface Water Based on Spectral Transformation and the Spectral Index
After the parameters C and g of the SVM were obtained by PSO, the spectral reflectance values of the sensitive bands of training and validation sets were taken as input variables and the measured total nitrogen concentration values of corresponding samples were taken as output variables. The total nitrogen concentration of each sample was obtained through an SVM machine learning algorithm and then its estimation accuracy was evaluated. According to Section 2.6, the larger of R 2 , RPD, and the smaller of RMSE, the higher of the model accuracy. From the results of the training set, the accuracy index of the model was persuasive, with R 2 , RMSE, and RPD values of 0.68-0.82, 0.62-0.93 mg/L, and 3.15-4.71, respectively. From the results of the validation set, the models based on (R 1/2 ) and (lg(1/R))' spectral transformations had approximately equal accuracies for estimating the total nitrogen concentration, but as can be seen in Figure 5, the model estimation accuracy under the (lg(1/R)) spectral transformation was more reasonable, with R 2 , RMSE, and RPD values of 0.576, 1.624 mg/L, and 1.985, respectively. This finding provides an available model for the estimation of total nitrogen concentration in surface water, but its robustness and predictability are not ideal. In the estimation model based on the spectral index, the model constructed by the sensitive band selected using the SDI spectral index had the highest accuracy, with R 2 of 0.721, RMSE value of 4.511 mg/L, and RPD of 0.715, but the R 2 of the other two spectral indexes was lower than that of the spectral transformation. Therefore, it is not an ideal model for quantitative estimation of nitrogen concentration in surface water either. The accuracy of PSO-SVM estimation model based on the optimized spectral transformation and spectral index is shown in Figure 5 and Table 5. 1.624 mg/L, and 1.985, respectively. This finding provides an available model for the estimation of total nitrogen concentration in surface water, but its robustness and predictability are not ideal. In the estimation model based on the spectral index, the model constructed by the sensitive band selected using the SDI spectral index had the highest accuracy, with R 2 of 0.721, RMSE value of 4.511 mg/L, and RPD of 0.715, but the R 2 of the other two spectral indexes was lower than that of the spectral transformation. Therefore, it is not an ideal model for quantitative estimation of nitrogen concentration in surface water either. The accuracy of PSO-SVM estimation model based on the optimized spectral transformation and spectral index is shown in Figure 5 and Table 5.

Estimation of the Total Nitrogen Concentration in Surface Water Based on Spectral Transformation and Spectral Index Coupling
Based on the first-order and second-order differential transformations of lgR, 1/R, lg1/R, and R 1/2 , a number of sensitive bands that were significantly related to the total nitrogen concentration were selected. Subsequently, the PSO-SVM model was used to estimate the total nitrogen concentration. It was found that the estimation model using (R 1/2 )′ and (lg(1/R))′ spectral transformations estimated the total nitrogen concentration well, but the accuracy indexes R 2 of these models only ranged between 0.5 and 0.6, suggesting that their estimation accuracy was limited. The estimation models built using the three spectral indexes SDI, SRI, and SNDI were also weak, with the maximum value of the determination coefficient R 2 of 0.721, the RMSE was as high as 4.511 mg/L, and RPD was only 0.715.

Estimation of the Total Nitrogen Concentration in Surface Water Based on Spectral Transformation and Spectral Index Coupling
Based on the first-order and second-order differential transformations of lgR, 1/R, lg1/R, and R 1/2 , a number of sensitive bands that were significantly related to the total nitrogen concentration were selected. Subsequently, the PSO-SVM model was used to estimate the total nitrogen concentration. It was found that the estimation model using (R 1/2 ) and (lg(1/R)) spectral transformations estimated the total nitrogen concentration well, but the accuracy indexes R 2 of these models only ranged between 0.5 and 0.6, suggesting that their estimation accuracy was limited. The estimation models built using the three spectral indexes SDI, SRI, and SNDI were also weak, with the maximum value of the determination coefficient R 2 of 0.721, the RMSE was as high as 4.511 mg/L, and RPD was only 0.715.
Here, it can clearly be seen that the estimation model under (R 1/2 ) and (lg(1/R)) spectral transformation had smaller R 2 , but better RMSE and RPD; in contrast, the model under SNDI spectral index had larger R 2 , but worse RMSE and RPD (Table 5). In order to find a way to inherit the advantages of the above two methods, the next step was to develop a more effective method to improve the accuracy of total nitrogen concentration estimation.
In view of this, an attempt was made to turn the spectral reflectance of (R 1/2 ) , (lg(1/R)) , and lg(R) into a two-dimensional matrix sequence and then nested them in three spectral index modules, SDI, SRI, and SNDI, using Matlab-R2012a software to build the hyperspectral matrix coefficient map of the spectral transformation and spectral index coupling ( Figure 6). The hyperspectral sensitive band combination with a high autocorrelation to the total nitrogen concentration was selected, with an expected improvement of the inversion accuracy. The selected sensitive band combination is shown in Table 6. Compared with the previous single forms, the correlation coefficient based on the coupling form did not change significantly, but its sensitive band was obviously different. Then, we tried to input these sensitive bands into the PSO-SVM model to detect whether the prediction accuracy index would improve. Table 6. Sensitive band combination and correlation coefficients for the combination of spectral transformation and the spectral index. After the parameters C and g of the SVM were obtained using PSO, the spectral reflectance corresponding to the sensitive bands of the training and validation set samples were taken as input parameters and the measured total nitrogen concentration of corresponding samples were taken as the output variable. Through the support vector machine (SVM) learning algorithm for learning and prediction, the total nitrogen concentration of each sample and its estimation accuracy was obtained. After optimization, only in the form of (lg(1/R)) -SRI and (R 1/2 ) -SRI, the decision coefficients of the estimation model were greater than 0.6.

Coupling Forms Sensitive Bands Combination (nm) Correlation Coefficient
From the results of the training set, the accuracies of the two models were found to be acceptable. From the perspective of the validation set, the R 2 , RMSE, and RPD values of the (lg(1/R)) -SRI coupled estimation model were 0.613, 1.798 mg/L, and 1.793, respectively. Similarly, the R 2 , RMSE, and RPD values of the (R 1/2 ) -SRI coupled estimation model were 0.604, 1.6 mg/L, and 2.002, respectively. Comparing the two estimation models, it was found that the (R 1/2 ) -SRI model had higher accuracy, considering that while their R 2 values were basically the same, their RMSE values were lower and their RPD values were higher (Figure 7 and Table 7).
Water 2020, 12, x 12 of 19 Water 2020, 12, x 13 of 19 Figure 6. Search for the hyperspectral sensitive band with the combination of spectral transformation and the spectral index.   Compared with the model constructed by spectral transformation or the spectral index, the PSO-SVM estimation model based on the (R 1/2 )′-SRI coupled form was more robust and its powers of inversion and prediction were clearly improved. Based on these positive results, we concluded that the coupling form was suitable for the quantitative estimation and prediction of total nitrogen concentration of surface water in arid areas.

Theoretical Basis
The material composition of surface water determines its spectral characteristics, and differences in the surface water concentrations often result in significant differences in reflectivity in particular bands [40,41], which is the theoretical basis for inversion of water components by spectral reflectance. To study the surface water nitrogen concentration, both direct and indirect methods confirm the feasibility of using hyperspectral data to estimate surface water quality indicators [42,43]. Furthermore, in this paper, the cluster analysis method was used to make the training set and verification set more representative, which is also a powerful measure to improve the generality of the estimation model. When a small sample is used, the PSO-SVM quantitative estimation model based on a sensitive band is applicable, consistent with Nagra [44]. In addition, the representative water samples were selected to have the lowest reflectivity, ensuring estimation accuracy for high reflectivity water samples.
In the literature, the PSO-SVM estimation model is known to be suitable for small samples [45], but the definition of small samples is not clear at present. Hence, we speculate that it may be a relative concept. Although the number of samples in this paper was only 32 (24 training sets and 8 verification sets), it achieved good results in the prediction of total nitrogen concentration. Our results indicate that a sample size of 32 is suitable for the PSO-SVM model. In addition, there were 14 samples of low concentration, 8 samples of medium concentration, and 10 samples of high concentration in this study, which shows that the concentration of selected samples has good continuity. As for the training set and verification set, we randomly selected them from the three levels according to the proportion of 7:4:5, which ensured the rationality of sample selection. Based on the above theoretical basis, the results of this study are scientifically acceptable.  Compared with the model constructed by spectral transformation or the spectral index, the PSO-SVM estimation model based on the (R 1/2 ) -SRI coupled form was more robust and its powers of inversion and prediction were clearly improved. Based on these positive results, we concluded that the coupling form was suitable for the quantitative estimation and prediction of total nitrogen concentration of surface water in arid areas.

Theoretical Basis
The material composition of surface water determines its spectral characteristics, and differences in the surface water concentrations often result in significant differences in reflectivity in particular bands [39,40], which is the theoretical basis for inversion of water components by spectral reflectance. To study the surface water nitrogen concentration, both direct and indirect methods confirm the feasibility of using hyperspectral data to estimate surface water quality indicators [41,42]. Furthermore, in this paper, the cluster analysis method was used to make the training set and verification set more representative, which is also a powerful measure to improve the generality of the estimation model. When a small sample is used, the PSO-SVM quantitative estimation model based on a sensitive band is applicable, consistent with Nagra [43]. In addition, the representative water samples were selected to have the lowest reflectivity, ensuring estimation accuracy for high reflectivity water samples.
In the literature, the PSO-SVM estimation model is known to be suitable for small samples [44], but the definition of small samples is not clear at present. Hence, we speculate that it may be a relative concept. Although the number of samples in this paper was only 32 (24 training sets and 8 verification sets), it achieved good results in the prediction of total nitrogen concentration. Our results indicate that a sample size of 32 is suitable for the PSO-SVM model. In addition, there were 14 samples of low concentration, 8 samples of medium concentration, and 10 samples of high concentration in this study, which shows that the concentration of selected samples has good continuity. As for the training set and verification set, we randomly selected them from the three levels according to the proportion of 7:4:5, which ensured the rationality of sample selection. Based on the above theoretical basis, the results of this study are scientifically acceptable.

Selection of Sensitive Bands
The high resolution and narrow bands of hyperspectral data result in a high data dimension, making much information redundant and making some bands noisy, increasing the need for data preprocessing [45], so it is particularly important to select appropriate dimension reduction methods to quantitatively estimate water quality indicators. Sensitive band selection is one of the main widely used dimensionality reduction methods [46], and its mechanism is to highlight the relevant main signal through a series of preprocessing steps and to weaken or shield the secondary signal. In general, sensitive band of photosensitive substances in surface water is easy to extract, but this research's object was total nitrogen in water, which is a non-photosensitive substance. Therefore, some measures had to be taken to search the sensitive band related to total nitrogen. Upon careful study it was found that the sensitive bands selected in this paper were concentrated around 680 nm, 850 nm, and 940 nm, and this is basically consistent with the studies of Schalles [47] and Yu [48].
It can be seen from Figure 2 that the spectral reflectance around these bands has different degrees of absorption, which may be caused by optically active variables in surface water. Coincidentally, Chen [49] and Li [50] proved that the total nitrogen concentration had a high correlation with chlorophyll-a, colored dissolved substances and total suspended solids, and they are all optically active variables. As for the method of selecting a sensitive band, it was found to be reliable to establish a model for quantitative estimation of total nitrogen concentration in surface water by means of spectral transformation and spectral index, similar to the methods of Zhao [51]. In particular, this paper attempted to establish a hyperspectral matrix coefficient map by coupling spectral transformation and the spectral index to screen sensitive bands. Sensitive bands are easy to identify, and the band information is rich ( Figure 6). The method can be understood as two-dimensional superposition of two sensitive bands, which can better highlight the sensitive bands related to the total nitrogen concentration.

Advantage of Coupling Model
The correlation coefficient of (R 1/2 ) -SRI in Table 6 is smaller than that of other coupling forms, but the final evaluation index of total nitrogen concentration estimation was better than that of other coupling forms, which may be caused by different degree of over fitting phenomenon in the model estimation process under other coupling forms.
In order to intuitively show the advantages of total nitrogen concentration estimation in coupling form, we compared the optimal evaluation indices in single and coupling forms (Table 8). In the specific implementation process, we took the coupling form as the standard and used the difference between the single form evaluation index and the coupling form evaluation index to reflect the advantages of the coupling form. It can be seen from Table 8 that the optimal evaluation indices in the spectral transformation situation were all worse than those in the coupling form, while R 2 in the optimal evaluation indices in the spectral index was 0.117 larger than that in the coupling form, but its RMSE was 2.9 larger than that in the latter, and RPD was 1.287 smaller than that in the latter, which shows that the estimation model error under the spectral index was large and the robustness was poor. Therefore, we know that the reliability of the coupling model was higher. In order to further illustrate the advantages of the coupling mode in selecting sensitive band, we tested the difference of evaluation indices in the testing set. This study was based on the hyperspectral sensitive band to estimate the total nitrogen concentration, so we chose the sensitive band with high correlation with the measured total nitrogen concentration to carry out the difference study. Among them, the selection mode of sensitive band in single form included spectral transformation and spectral index (Tables 3 and 4); the selection mode of sensitive band in coupling form is shown in Table 6. Then we defined single forms as the first group and coupling forms as the second group. Through the normal test, the authors found that R 2 and RPD satisfied the normal distribution except RMSE. Therefore, independent sample t-test was used to detect the difference of R 2 and RPD between the two groups, and independent sample nonparametric test was used to detect the difference of RMSE. The results are shown in Table 9. It can be seen from Table 9 that the t-test result of R 2 was greater than 0.05, indicating that there was no significant difference in R 2 between the two groups; similarly, the t-test result of RPD was 0.024, and the non-parametric test result of RMSE was only 0.03, indicating that there is significant difference in RPD and RMSE. The average values of the first and second groups of evaluation indicators also supported the above views. In addition, R 2 and RPD of the second group were 0.057 and 0.524 larger than that of the first group, and RMSE was 0.964 smaller than that of the first group. Therefore, combined with all the above analysis, we conclude that the PSO-SVM model constructed by the sensitive band optimized by coupling form has stronger prediction ability.

Deficiencies and Prospects
This study only considered one year's field data rather than a long time series, possibly explaining the relatively low inversion accuracy of this model [52]. Whether the model selected in this study is suitable for other regions, as well as the efficacy of using other data sources to estimate the total nitrogen concentration in surface water, need to be further verified [53]. Eventually, the inversion model presented in this study was based on surface water samples, and the accuracy of models based on water samples from the whole water column remains to be studied.

Conclusions
In this study, a new strategy was selected to improve the hyperspectral estimation of total nitrogen concentration in surface water. The main conclusions are as follows: (1) After determination, the nitrogen concentration of all samples was quite variable, the coefficient of variation was 83.12%, and the concentration in the lower reaches was roughly higher than that in the upper reaches. (2) Through correlation analysis of the hyperspectral reflectance value and the measured nitrogen concentration under spectral transformation, the spectral index and their coupling forms, it was found that the bands near 680, 850, and 940 nm can be used as the sensitive bands for the inversion of the total nitrogen concentration in surface water in arid areas.
(3) After optimization, in the estimation model based on the sensitive bands, the model based on the coupling form of (R 1/2 ) -SRI had the highest accuracy, followed by the accuracy of the model based on (R 1/2 ) spectral transformation and that of the model based on the SDI spectral index.
In summary, compared with the estimation model based on spectral transformation or the spectral index alone, the estimation model based on the coupled form had the greatest accuracy, potentially providing a new method for improving the efficiency and accuracy of surface water nitrogen concentration monitoring in arid regions.