Remote Monitoring of NH 3 -N Content in Small-Sized Inland Waterbody Based on Low and Medium Resolution Multi-Source Remote Sensing Image Fusion

: In applying quantitative remote sensing in water quality monitoring for small inland rivers, the time-frequency of monitoring dramatically impacts the accuracy of time-spatial changes estimates of the water quality parameters. Due to the limitation of satellite sensor design and the inﬂuence of atmospheric conditions, the number of spatiotemporal dynamic monitoring images of water quality parameters is insufﬁcient. Meanwhile, MODIS and other high temporal resolution images’ spatial resolution is too low to effectively extract small inland river boundaries. To solve the problem, many researchers used Spatio-temporal fusion models in multisource data remote sensing monitoring of ground features. The wildly used Spatio-temporal fusion models, such as FSDAF (ﬂexible spatial-temporal data fusion), have poor performance in heterogeneous changes of ground objects. We proposed a spatiotemporal fusion algorithm SR-FSDAF (Super-resolution based ﬂexible spatiotemporal data fusion) to solve the problem. Based on the FSDAF, it added ESPCN to reconstruct the spatial change prediction image, so as to obtain better prediction results for heterogeneous changes. Both qualitative and quantitative evaluation results showed that our fusion algorithm obtained better results. We compared the band sensitivity of the images before and after fusion to ﬁnd out that the sensitive band combination of NH 3 -N has not changed, which proved that the fusion method can be used to improve the time-frequency of NH 3 -N inversion. After the fusion, we compared the accuracy of linear regression and random forest inversion models and selected the random forest model with better accuracy to predict the NH 3 -N concentration. The inversion accuracy of NH 3 -N was as follows: the R 2 was 0.75, the MAPE was 23.7% and the RMSE was 0.15. The overall concentration change trend of NH 3 -N in the study area was high-water period < water-stable period < low water period. NH 3 -N pollution was serious in some reaches. unsupervised classiﬁcation, the temporal ESPCN Landsat as ﬁne and the high temporal resolution MODIS image is as coarse images. Landsat and MODIS t 1 and the MODIS at t to Landsat at


Introduction
Water is one of the most important materials on earth. With human industrial production, agricultural breeding, and daily activities, a large amount of sewage is discharged into the surrounding water environment, resulting in environmental pollution and a severe impact on water supply ecology and human health. Monitoring the water quality in time and obtaining the temporal-spatial variation characteristics of regional water pollutant concentration is of great significance for assessing the risk of water pollution and effectively preventing water pollution. Traditional water quality detection methods are costly, timeconsuming, and laborious. The pollutant concentration obtained at the sampling point cannot reflect the distribution of pollutants in the whole region. Using remote sensing images can realize regional synchronous observation, obtain the overall distribution of of heterogeneous changes [32]. However, the performance is still not ideal, and it cannot meet the requirement of higher precision change monitoring.
To improve the poor performance in heterogeneous region prediction, we propose an improved Spatio-temporal data fusion model SR-FSDAF (super-resolution flexible spatiotemporal data fusion). Unlike deep learning techniques, which use high-cost training to enhance the accuracy of image fusion, it maintains the simplicity of FSDAF and uses fewer image pairs and less time. SR-FSDAF inherits the hybrid model of FSDAF and improves the thin-plate spline sampling method based on MODIS spectral information to extract spatial variation details and achieves better reconstruction results consistent with resampling purposes. SR-FSDAF can more accurately predict fine-resolution images of heterogeneous regions. Different from other multi-spectral and high-resolution images, OLI and MODIS were launched earlier and allowed free access to more historical image information. Some bands of MODIS are similar to those of OLI and have the band basis for image fusion. Therefore, SR-FSDAF was tested using MODIS and Landsat-8 OLI images and compared with other fusion methods such as STARFM. Then MODIS images, OLI images, and SR-FSDAF were applied to water quality monitoring in Xinyang City to improve the utilization of monitoring frequency and low-resolution images. The NH 3 -N inversion model was established based on the fused image band, and the NH 3 -N concentration in the Huaihe River Basin of the Xinyang area was analyzed. For NH 3 -N concentration inversion, we compared the accuracy of statistical regression model and random forest model, and adopted random forest model to further improve the accuracy of NH 3 -N concentration prediction. [32]. In the method, fine pixels are classified into different types. Temporal variation values of each type are roughly gained by calculating the changes of different classes of ground objects reflected in the pure pixels of MODIS. The thin plate spline sampling (TPS) is carried out on the MODIS image of the prediction date to obtain the rough spatial variation value of ground objects [39]. TPS interpolates and resamples the data based on spatial correlation to preserve the local change information of the image. The two kinds of change values are given different weights by neighborhood information, which reduces the deviation of prediction results and has more stability and spatial continuity.

Zhu et al. proposed FSDAF in 2016
However, FSDAF's prediction quality still declines much in the case of heterogeneous mutation. Based on the idea of FSDAF, this paper used efficient ESPCN (Efficient sub-pixel convolution neural network) [40] to replace TPS for MODIS images, which is the main information source of spatial mutation prediction, and retain more texture information, so as to improve the prediction accuracy of spatial heterogeneity.

ESPCN
ESPCN inherits the idea of the super-resolution algorithm. In the network, the highresolution image is down-sampled to the low-resolution image, and the convolutional neural network is used to learn the mapping relationship between the low-resolution and high-resolution images so as to realize the super-resolution reconstruction of the image. Figure 1 shows the framework and parameters of the ESPCN. ESPCN first applies the 3layer convolutional neural network directly to low-resolution images to avoid experiencing an amplification before entering the network. After three convolution operations, the low-resolution image is mapped into a feature map with c × r × r channels, and r is the upscaling factor. Finally, the high-resolution image is generated by a sub-pixel convolution layer. The sub-pixel convolutional layer enhances its memory for a position by periodically filtering functions, combining feature maps with channel c × r × r into high-resolution images.
Water 2022, 14,3287 4 of 21 f l I LR ; W 1:l , b 1:l = ∅ W l × f l−1 I LR + b l−1 (2) PS(T) x,y,z = T x r , y r , c·r·mod(y, r) + c·mod(x, r) I HR = f l I LR = PS f l−1 I LR + b l (4) Water 2022, 14, 3287 4 of 22 the upscaling factor. Finally, the high-resolution image is generated by a sub-pixel convolution layer. The sub-pixel convolutional layer enhances its memory for a position by periodically filtering functions, combining feature maps with channel × × into highresolution images.
The basic calculation of the network is shown in Equations (1)-(4). Equation (1) and Equation (2) follow the principle of convolution theory.
is the low-resolution image of the input network, is the high-resolution image learned by ESPCN, and is the convolution kernel function ( is the convolution layer). and are the weight and bias parameters of the convolution kernel, which are obtained by iterative learning of the network. ∅ is a nonlinear activation function. Equations (3) and (4) are function descriptions of the subpixel convolution part.
is a periodic shuffle operator, which can reorder tensors of size × × × to tensors of size × × .
Feeding the oversize image block to the ESPCN method will greatly affect the network effects; therefore, our paper reconstructed the input low-resolution MODIS image twice with the upsampling factor r = 4. The parameters of the network were set to = 3,  The basic calculation of the network is shown in Equations (1)-(4). Equation (1) and Equation (2) follow the principle of convolution theory.
I LR is the low-resolution image of the input network, I HR is the high-resolution image learned by ESPCN, and f i is the convolution kernel function (i is the convolution layer). W i and b i are the weight and bias parameters of the convolution kernel, which are obtained by iterative learning of the network. ∅ is a nonlinear activation function. Equations (3) and (4) are function descriptions of the subpixel convolution part. PS is a periodic shuffle operator, which can reorder tensors of size H × W × C × r 2 to tensors of size rH × rW × C.
Feeding the oversize image block to the ESPCN method will greatly affect the network effects; therefore, our paper reconstructed the input low-resolution MODIS image twice with the upsampling factor r = 4. The parameters of the network were set to l = 3, ( f 1 , n 1 ) = (5, 64), ( f 2 , n 2 ) = (3, 32), and f 3 = 3, r = 4. For the training samples of network learning, the image pairs are composed of the original Landsat image and the downsampled image to the 1/4 of the original image pair, and the 1/4 image and the further downsampled 1/4 lowresolution image pair. To avoid repetitive training of the original image pixels, the stride for extracting the sub-image blocks from the original image was (17 − ∑ mod( f , 2)) × r, and the stride for extracting the sub-image blocks from the lower resolution image in the image pair was (17 − ∑ mod( f , 2)).

Improved Spatiotemporal Fusion Model SR-FSDAF
SR-FSDAF combines the super-resolution algorithm and the flexible Spatio-temporal data fusion model algorithm. The SR-FSDAF has six steps: The computing of MODIS pixel purity based on unsupervised classification, coarse estimation of the pixel temporal change, residual computing of pixel temporal changes, image reconstruction and spatial change prediction based on ESPCN super-resolution, residual distribution calculation, and enhancement and fusion based on neighborhood information. For the convenience of the description, the high spatial resolution Landsat image is defined as fine images, and the high temporal resolution MODIS image is expressed as coarse images. Landsat and MODIS images at t 1 and the MODIS image at t 2 are used to predict Landsat images at t 2 . Figure 2 shows the flow of the algorithm. The detailed principle of the algorithm refers to in FSDAF [32] and ESPCN [40]. The variables and definitions of the model are as follows: the high temporal resolution MODIS image is expressed as coarse images. Landsat and MODIS images at and the MODIS image at t2 are used to predict Landsat images at t2. Figure 2 shows the flow of the algorithm. The detailed principle of the algorithm refers to in FSDAF [32] and ESPCN [40]. The variables and definitions of the model are as follows: -the pixel number of the corresponding Landsat-8 image in a MODIS pixel, which is a constant of 16; (   P c (x i , y i )-the proportion of class c Landsat-8 pixels in the ith MODIS pixel; ∆M(x i , y i , b) the change of the ith MODIS pixel from t 1 to t 2 at band b; ∆F(c, b)-changes of category c pixels on the bth band of Landsat-8 images from t 1 to t 2 , and c is one of the classification results of ground objects.

Unsupervised Classification of Landsat Images at t 1 Time
Through image preprocessing, the spatial resolution of the MODIS image is 480 m. The spatial resolution of the Landsat image is 30 m. Therefore, the coverage area of one MODIS pixel is the same as that of 16 Landsat pixels. Based on this correspondence, we use the Kmeans algorithm to classify the Landsat images at time t 1 and set the number of clustering to four categories: water, farmland, buildings, and woodland. P c (x i , y i ) represents the proportion of Landsat pixels of classification c in 16 pixels (c = 1, . . . 4). For calculation, see Formula (5). N c (x i , y i ) is the number of Landsat pixels of classification c in 16 pixels.
The K-means method first calculates the initial mean of the categories uniformly distributed in the data space and then iterates with the principle of the shortest distance to aggregate the pixels into the nearest cluster. Recalculate the mean values of classes in each iteration and reclassify pixels with these mean values until the variance within classes meets the requirements.

Rough Estimation of Pixel Temporal Change
The second step is calculating the change of MODIS image reflectivity from t 1 to t 2 . The calculation formula shows in Equation (6).
According to the spectral unmixing principle, the spectral value of the MODIS pixel can be expressed by the weighted calculation results of the spectral reflectance value of various ground objects in the pixel. Therefore, the temporal variation of the MODIS pixel can also be calculated by the Formula (7). The ∆F(c, b) in the Formula (7) is variation values of Landsat class c objects on band b, which can be obtained by least squares inverse solution the Formula (7).
Sort the P c (x i , y i ) values of c-class ground objects in MODIS pixels from large to small. The larger the P c (x i , y i )-value is, the higher the proportion of c-type ground objects in MODIS pixels is, and the purer the pixels are. According to the P c (x i , y i ) values, select n MODIS pixels from high to low to solve ∆F(c, b).

Residual Computing of Pixel Temporal Changes
Assuming that there is no spectral information mutation in the ground object, the prediction result of the fine-resolution image at time t 2 is L TP t2 x ij , y ij , b . The calculation of the prediction result shows in Formula (8): Usually, the spectral information of the ground objects will change between the two moments, and the resulting deviation defines as R(x i , y i , b). The calculation Formula shows in Formula (9), where n is 16.

Image Reconstruction and Spatial Change Prediction Based on ESPCN Super-Resolution
Using the ESPCN network, MODIS images at time t 2 are input to obtain higher resolution images L SR t2 x ij , y ij , b reconstructed by the super-resolution algorithm. The residual between the high-resolution image obtained by the ESPCN method and the true high-resolution image at t 2 can express as E SR x ij , y ij , b : The ESPCN method directly applies the convolution layer to the coarse image to avoid the loss of detailed information. The sub-pixel convolution layer restores the feature map to the super-resolution image. The input MODIS images obtain a high-resolution image by three consecutive convolution operations and sub-pixel layer rearrangement. ESPCN learns the feature mapping relationship between coarse and fine images more comprehensively and can retain the spatial feature information of the input image.

Residual Distribution Calculation
This step uses the homogeneity index to assign weights to residual R(x i , y i , b) and residual E SR x ij , y ij , b to calculate the final residual. The difference between the two prediction results calculates as the difference residual E SR−TP x ij , y ij , b , which shows as Formula (11): Use the 4 × 4 moving window to calculate the ratio I x ij , y ij of the number of pixels consistent with the ground object category of the central fine-resolution pixel to the total number of pixels in the window as the homogeneity index. The calculation Formula is (12), where x ij , y ij is the central pixel of the moving window. When the pixel categories in the window are consistent, the value of I k is 1, otherwise is 0.
Calculate the weight E w x ij , y ij , b according to the homogeneity index I (Formula (13)). The change degree of homogeneous pixels is determined based on the prediction of ESPCN super-resolution, and the change degree of heterogeneous pixels is determined based on the prediction of ESPCN super-resolution.
Normalize the E w x ij , y ij , b to obtain W x ij , y ij , b (Formula (14)): Calculate the residual distribution of the predicted fine resolution image based on the normalized weight W (Formula (15)). Then, calculate the change value of fine-resolution pixel ∆L x ij , y ij , b from time t 1 to time t 2 , as shown in Formula (16).

Enhancement and Fusion Based on Neighborhood Information
Use the neighborhood information to improve the prediction stability and reduce the block effect caused by calculation. For the fine-resolution image pixel x ij , y ij at time t 1 , n fine pixels with the same class and the least spectral difference with the x ij , y ij in the neighborhood are selected. The computation formula of spectral difference between the kth fine-resolution pixel and the similar neighborhood pixel is S k , as shown in Formula (17).
The weight contribution of these similar pixels to the center pixel follows the distance principle (Formula (18)). The size of w in w 2 depends on the size of the neighborhood when taking 20 similar pixels. The farther the distance, the smaller the weight contribution value. After normalization, the calculation formula of weight w k is (19): Water 2022, 14, 3287 8 of 21 After adding the neighborhood information, the final prediction image result is (20):

Inversion Models
In the process of remote sensing quantitative inversion, the accurate selection of characteristic bands is the basis for obtaining high inversion accuracy. We use the concentration the value of water quality parameter NH 3 -N to calculate the correlation coefficient between reflectivity and band combinations and select the band or band combination with high correlation as the modeling parameter.
The traditional statistical regression model (TSR) has been widely used in the inversion of water quality parameters [41,42].In this paper, the linear function, quadratic function, exponential function, and logarithmic function are used to construct the inversion model ( Table 1). The model with the highest inversion accuracy is selected as the model type of the traditional statistical regression model.

Regression Function Mathematical Expression (a, b and c Are Undetermined Parameters)
linear model At the same time, this paper selects the random forest model with high learning efficiency as the machine learning algorithm to train and obtains the machine learning model for the back study of water quality parameters.

Evaluation Index
To evaluate the accuracy of the fusion method, this paper uses three algorithms: STARFM, FSDAF, and SRCNN embedding FSDAF as comparison methods. We analyze the experimental results from the qualitative evaluation and quantitative evaluation. The qualitative evaluation method compares the fusion results of different models at time t 2 with Landsat images at time t 2 . Through visual observation, we can determine whether the local details and the overall spectral difference are too large in the simulation reality. The quantitative evaluation method uses three evaluation indexes to comprehensively evaluate the overall structural similarity of the fused image, the degree of reflectivity reduction of the fusion image, and the spectral fidelity of the fusion image.
The overall structural similarity index is structural similarity SSIM, which is widely used to evaluate the linear relationship strength of two similar images. The calculation method shows in Formula (21): µ x and µ x are the average values of the Landsat image and the fused image at time t2, respectively; σ x and σ y are the image variances of the two; C 1 and C 2 are non-0 constants used to ensure that the results are rational. The more similar the overall structure of the two images is, the closer the SSIM value is to 1.
The evaluation index of reflectivity reduction degree is the root mean square error RMSE, which reflects the simulation fusion results of pixel value reduction degree and detail information. The formula shows in (22). x(i, j) is the Landsat true image, and y(i, j) is the fusion image.
The evaluation index used for spectral fidelity is the spectral angle SAM (Spectral angle Mapper), which regards the single-pixel spectrum as a high-dimensional vector and calculates the vector angle of the spectral vector of the pixels in the same position of the two images. The smaller the value is, the more similar the spectrum between the pixels is. The specific angle calculation formula is as follows (23).
The accuracy of the inversion model was evaluated by a fitting coefficient (R 2 ), mean absolute percentage error (MAPE), and root mean square error (RMSE). The formula of MAPE shows in (24).

Study Area
The study area is 113 • 45 E~115 • 55 E, 30 • 23 N~32 • 27 N in the Xinyang section of Huaihe River Basin. The study area is located on the boundary line between North and South China (Qinling-Huaihe Line), which belongs to the transition zone between subtropical and temperate monsoon climates and the transition zone between humid and semi-humid regions. The main tributaries in the region are shown in Figure 2.
On 5 December 2016 and 1 January 2017, the concentrations of NH 3 -N in 43 water samples were collected in the study area. The distribution of the measured sampling points and the location of the study area are shown in Figure 3.

Landsat-8 OLI
The spatial resolution of Landsat-8 is 30 m, and the return visit period is 16 days, which is the fusion data source commonly used in the Spatio-temporal fusion algorithm. For the test of the fusion model, two sets of MODIS-Landsat image pairs are used in this paper.
The NH 3 -N inversion of remote sensing data selected less than 10% of the cloud Landsat-8 OLI data, image bands, and other specific information can be seen in the Table 2. The selected Landsat-8 OLI images were preprocessed, such as atmospheric correction.

MODIS
MODIS has a spatial resolution of 500 m and a return visit period of 1 day, a commonly used fusion data source for Spatio-temporal fusion algorithms.
For the fusion model and NH 3 -N inversion experiments, this paper uses MODIS daily surface reflectance data on the same date as the corresponding Landsat. The selected MODIS data has been preprocessed, and the MODIS is reprojected and resampled to 480 m to facilitate matching and calculation with Landsat image pixels with a spatial resolution of 16 m. The specific information of MODIS is shown in the Table 3. 022, 14, 3287 10 of 22

Landsat-8 OLI
The spatial resolution of Landsat-8 is 30 m, and the return visit period is 16 days, which is the fusion data source commonly used in the Spatio-temporal fusion algorithm. For the test of the fusion model, two sets of MODIS-Landsat image pairs are used in this paper.
The NH3-N inversion of remote sensing data selected less than 10% of the cloud Landsat-8 OLI data, image bands, and other specific information can be seen in the Table 2. The selected Landsat-8 OLI images were preprocessed, such as atmospheric correction.

Evaluations of Spatio-Temporal Fusion Model
Cut the Landsat and MODIS images to specified size so that the ratio of the Landsat and MODIS image is 16:1. From Tables 2 and 3, the band ranges of B2, B3, B4, Table 4 shows the specific calculated values of RMSE, SSIM, and SAM of the four methods in Figure 4. The best fusion result values of each band are highlighted by thick lines. The optimal value of SAM is 3.417 of SR-FSDAF, indicating that the fused image of SR-FSDAF has the maximum relative spectral fidelity. Table 4 shows the specific calculated values of RMSE, SSIM, and SAM of the four methods in Figure 5. The best fusion result values of each band are highlighted by thick lines. The optimal value of SAM is 3.417 of SR-FSDAF, indicating that the fused image of SR-FSDAF has the maximum relative spectral fidelity.
On Band1, FSDAF achieved better fusion results. For Band 2-Band 7, SR-FSDAF obtained better values in RMSE and SSIM. The performance of SR-FSDAF proposed in this paper on SSIM and RMSE is equivalent to that of FSDAF and SRCNN embedding models on some bands, but the total results of SR-FSDAF are the best, and it has an excellent performance in structure similarity and detail retention. The performance of FSDAF and SRCNN embedded models is the second, and the results of STARFM are the worst. The ground objects of the second group of experimental images changed abruptly in From the fusion result images, STARFM, SRCNN-embedded model, FSDAF, and SR-FSDAF have obtained similar fusion results with the true Landsat image. The SRCNNembedded method is similar to the SR-FSDAF, and SRCNN is used instead of TPS for upsampling. Through the partial detail comparison, SR-FSDAF retains more details of the roof. Table 4 shows the specific calculated values of RMSE, SSIM, and SAM of the four methods in Figure 4. The best fusion result values of each band are highlighted by thick lines. The optimal value of SAM is 3.417 of SR-FSDAF, indicating that the fused image of SR-FSDAF has the maximum relative spectral fidelity.  Table 4 shows the specific calculated values of RMSE, SSIM, and SAM of the four methods in Figure 5. The best fusion result values of each band are highlighted by thick lines. The optimal value of SAM is 3.417 of SR-FSDAF, indicating that the fused image of SR-FSDAF has the maximum relative spectral fidelity.  Table 5 shows specific values of the three evaluation indexes of the four methods of the second set of images. The best fusion results for each band are sharpened and underlined. The optimal value of SAM is 7.439 of SR-FSDAF. Band1-Band7 index optimal value method is SR-FSDAF, FSDAF and SR-FADAF index performance is similar. SR-FADAF still has the best fusion result among the four methods when the ground changes. However, compared with the fusion results of the regions without mutation in the first group, the fusion quality of the four methods decreases on heterogeneous changes.   On Band1, FSDAF achieved better fusion results. For Band 2-Band 7, SR-FSDAF obtained better values in RMSE and SSIM. The performance of SR-FSDAF proposed in this paper on SSIM and RMSE is equivalent to that of FSDAF and SRCNN embedding models on some bands, but the total results of SR-FSDAF are the best, and it has an excellent performance in structure similarity and detail retention. The performance of FSDAF and SRCNN embedded models is the second, and the results of STARFM are the worst.

Inversion Based on Fused Images
The ground objects of the second group of experimental images changed abruptly in the period. From the fusion result ( Figure 5), STARFM, SRCNN embedding model, FSDAF, and SR-FSDAF obtained similar fusion results. The SR-FSDAF method is more accurate in capturing the change of ground objects and detailed information, but the fusion results of all mutation methods are less satisfactory than those of the first group. Table 5 shows specific values of the three evaluation indexes of the four methods of the second set of images. The best fusion results for each band are sharpened and underlined. The optimal value of SAM is 7.439 of SR-FSDAF. Band1-Band7 index optimal value method is SR-FSDAF, FSDAF and SR-FADAF index performance is similar. SR-FADAF still has the best fusion result among the four methods when the ground changes. However, compared with the fusion results of the regions without mutation in the first group, the fusion quality of the four methods decreases on heterogeneous changes.

Correlation Analysis of NH 3 -N
The measured data of water samples collected from 43 sampling points of Xinyang key water function areas on 5 December 2016 and 1 January 2017 are randomly selected for 70%. We analyzed the Pearson correlation coefficient between 70% of the measured data and the different bands or band combinations of the fused images generated by  ) is calculated, respectively, and the highest correlation coefficient among these combinations was taken as the R value between A and B to draw the correlation matrix.
We calculated the correlation between NH 3 -N and these band combinations of different fusion images and Landsat-8 images. The results were drawn into a correlation matrix ( Figure 6). We calculated the correlation between NH3-N and these band combinations of different fusion images and Landsat-8 images. The results were drawn into a correlation matrix ( Figure 6). Selected bands or band combinations with correlations above 0.7 (Table 6). We found that the highest NH3-N correlation band of STARFM changed to B/R after spatiotemporal fusion. The highest correlation bands of SR-FSDAF, SRCNN embedding model, and FSDAF are consistent with Landsat images, which are (R − G)/(R + G). The correlation between SR-FSDAF and Landsat-8 was consistent, 0.81. This means that the surface reflection in the SR-FSDAF fusion image is highly similar to that in the original image. The SR- Selected bands or band combinations with correlations above 0.7 (Table 6). We found that the highest NH 3 -N correlation band of STARFM changed to B/R after spatiotemporal fusion. The highest correlation bands of SR-FSDAF, SRCNN embedding model, and FSDAF are consistent with Landsat images, which are (R − G)/(R + G). The correlation between SR-FSDAF and Landsat-8 was consistent, 0.81. This means that the surface reflection in the SR-FSDAF fusion image is highly similar to that in the original image. The SR-FSDAF fusion image can be used for the inversion of water quality parameters.

Accuracy Comparison of Inversion Models
We constructed inversion models of NH 3 -N based on statistical regression and random forest using the SR-FSDAF fusion image, respectively. In this paper, we used 64 samples to train the model and the remaining 24 samples for accuracy verification. The accuracy of the models was evaluated by the difference between the measured and estimated values.
The optimal model inversion results are shown in Figure 7 with the band combination of Red and Green. The optimal results of the statistical regression method are shown in Figure 7a. The R 2 is 0.66, RMSE is 0.16, and MAPE is 30.1%. The optimal inversion results based on the random forest method are shown in Figure 7b for the combination of Blue, Green, Red, and NIR. The R2 was 0.75, RMSE was 0.15, and MAPE was 23.7%. The results showed that the random forest method had great advantages in estimating NH 3 -N concentration.

Accuracy Comparison of Inversion Models
We constructed inversion models of NH3-N based on statistical regression an dom forest using the SR-FSDAF fusion image, respectively. In this paper, we used 6 ples to train the model and the remaining 24 samples for accuracy verification. Th racy of the models was evaluated by the difference between the measured and est values.
The optimal model inversion results are shown in Figure 7 with the band co tion of Red and Green. The optimal results of the statistical regression method are in Figure 7a. The R 2 is 0.66, RMSE is 0.16, and MAPE is 30.1%. The optimal invers sults based on the random forest method are shown in Figure 7b for the combina Blue, Green, Red, and NIR. The R2 was 0.75, RMSE was 0.15, and MAPE was 23.7 results showed that the random forest method had great advantages in estimatin N concentration.  . Accuracy assessment results of NH 3 -N estimated by the statistical regression models and random forest model using SR-FSDAF fused images. The X-axis is the observed data, and the Y-axis is the predicted data. (a) used the statistical regression models (quadratic model); (b) used random forest model.

Spatio-Temporal Distribution of the NH 3 -N
This paper conducted further research based on the random forest inversion model. The inversion diagrams of NH 3 -N concentration from January to December 2017 were obtained by inversion, and its distribution characteristics and variation trend were analyzed. Figure 8 shows the percentage of water surface area of different NH 3 -N concentration levels in twelve months. According to the relevant survey data, the Huaihe River system is a wet season from July to August, a dry season from December to February, and a waterstable season in other months. The changes in NH 3 -N concentrations in different periods and the classification standard for NH 3 -N concentration are shown in Table 7. Figure 9 shows our classification of NH 3 -N concentrations and the distribution of NH 3 -N in January 2017.  Table 7. Figure 9 shows our classification of NH3-N concentrations and the distribution of NH3-N in January 2017.  Comprehensive Figure 8 and Table 7, the main component of the water body in the wet season is type II NH3-N concentration water; the water quality is the best, the NH3-N concentration is the lowest, and the changing trend is gentle.
NH3-N concentration of water body in stable period is greatly disturbed. The NH3-N concentration is relatively high in the early water stability period from March to June, and the NH3-N concentration is relatively low in the late water stability period from September to November.
The concentration of NH3-N is the highest in the dry season and changes gently. In the dry season, the lowest concentration of NH3-N is in December, mainly Class II and III concentrations. From December to February, the concentration of NH3-N gradually increased. The area ratio of type InferiorV NH3-N concentrated water increased and concentrated in the central region in January.  Comprehensive Figure 8 and Table 7, the main component of the water body in the wet season is type II NH 3 -N concentration water; the water quality is the best, the NH 3 -N concentration is the lowest, and the changing trend is gentle. NH 3 -N concentration of water body in stable period is greatly disturbed. The NH 3 -N concentration is relatively high in the early water stability period from March to June, and the NH 3 -N concentration is relatively low in the late water stability period from September to November.
The concentration of NH 3 -N is the highest in the dry season and changes gently. In the dry season, the lowest concentration of NH 3 -N is in December, mainly Class II and III concentrations. From December to February, the concentration of NH 3 -N gradually increased. The area ratio of type InferiorV NH 3 -N concentrated water increased and concentrated in the central region in January.

Discussion
The reflectance characteristics of different concentrations of water quality parameters in a specific wavelength range are the analytical basis for the quantitative inversion of water quality parameters using the spectral information of remote sensing images. Our study shows that the sensitive bands of NH3-N are Blue, Green, Red, and NIR, showing the combined frequency characteristics of nitrogen-containing functional groups.
In the actual research process, remote sensing inversion is completed by establishing effective connections between the point data obtained by field sampling and the surface data of remote sensing pixels with different spatial resolutions. The difference between the sampling and the satellite transit time, the limited water quantity of inland water, and the significant Spatio-temporal changes cause the error of inversion results. Use the Spatio-temporal fusion method to reduce the bias and reflect the change in water quality parameters better, which is the significance of this study. The time resolution of Landsat images is increased by using the Spatio-temporal algorithm and generating a series of high-frequency sequential images for water quality inversion.
The SR-FSDAF model has better visual effects and index results than the STARFM, SRCNN embedding model, and FSDAF in the case of non-heterogeneous mutation and heterogeneous mutation. For non-heterogeneous mutation images, RMSE, SSIM, and SAM of SR-FSDAF are 0.03 (mean), 0.976 (mean), and 3.417, respectively, and for heterogeneous mutation regions, RMSE, SSIM, and SAM are 0.021 (mean), 0.810 (mean) and 7.439, respectively.
To prove the advantages of the SR-FSDAF fusion method for water quality monitoring, STARFM, SRCNN embedding model, FSDAF, and SR-FSDAF fusion image are used to calculate the correlation coefficient distribution of different band combinations and NH3-N. The most sensitive band combination and correlation of SR-FSDAF are highly consistent with Landsat-8 images. Therefore, the SR-FSDAF method can be used for quantitative inversion of water quality parameters.
The change in water quality is closely related to the evolution of the surrounding environment. NH3-N is an essential fertilizer for crop growth and a common component in industrial and domestic sewage. The concentration of NH3-N in water is often affected by sewage discharge from human production and life and drugs and fertilizers used in agricultural activities. To further analyze the temporal and spatial variation characteristics of water quality in Xinyang City, we selected four regions (Figure 9a-d). According to the 1 km-land use classification map of Xinyang City in 2017 ( Figure 10) and field survey,

Discussion
The reflectance characteristics of different concentrations of water quality parameters in a specific wavelength range are the analytical basis for the quantitative inversion of water quality parameters using the spectral information of remote sensing images. Our study shows that the sensitive bands of NH 3 -N are Blue, Green, Red, and NIR, showing the combined frequency characteristics of nitrogen-containing functional groups.
In the actual research process, remote sensing inversion is completed by establishing effective connections between the point data obtained by field sampling and the surface data of remote sensing pixels with different spatial resolutions. The difference between the sampling and the satellite transit time, the limited water quantity of inland water, and the significant Spatio-temporal changes cause the error of inversion results. Use the Spatio-temporal fusion method to reduce the bias and reflect the change in water quality parameters better, which is the significance of this study. The time resolution of Landsat images is increased by using the Spatio-temporal algorithm and generating a series of high-frequency sequential images for water quality inversion.
The SR-FSDAF model has better visual effects and index results than the STARFM, SRCNN embedding model, and FSDAF in the case of non-heterogeneous mutation and heterogeneous mutation. For non-heterogeneous mutation images, RMSE, SSIM, and SAM of SR-FSDAF are 0.03 (mean), 0.976 (mean), and 3.417, respectively, and for heterogeneous mutation regions, RMSE, SSIM, and SAM are 0.021 (mean), 0.810 (mean) and 7.439, respectively.
To prove the advantages of the SR-FSDAF fusion method for water quality monitoring, STARFM, SRCNN embedding model, FSDAF, and SR-FSDAF fusion image are used to calculate the correlation coefficient distribution of different band combinations and NH 3 -N. The most sensitive band combination and correlation of SR-FSDAF are highly consistent with Landsat-8 images. Therefore, the SR-FSDAF method can be used for quantitative inversion of water quality parameters.
The change in water quality is closely related to the evolution of the surrounding environment. NH 3 -N is an essential fertilizer for crop growth and a common component in industrial and domestic sewage. The concentration of NH 3 -N in water is often affected by sewage discharge from human production and life and drugs and fertilizers used in agricultural activities. To further analyze the temporal and spatial variation characteristics of water quality in Xinyang City, we selected four regions (Figure 9a Figures 12-14 contain more farmland, so we compare the NH3-N concentration and NDVI results to verify its relationship with agricultural activities. According to the subtropical and temperate monsoon climate in the same period of rain and heat in Xinyang City, if there is no interference from human activities, the change of NH3-N concentration is mainly affected by the amount of river water, which is higher in December to February and lower in June to August.    Figures 12-14 contain more farmland, so we compare the NH 3 -N concentration and NDVI results to verify its relationship with agricultural activities. According to the subtropical and temperate monsoon climate in the same period of rain and heat in Xinyang City, if there is no interference from human activities, the change of NH 3 -N concentration is mainly affected by the amount of river water, which is higher in December to February and lower in June to August. region (a) is the main industrial region. (b) and (c) are farmland on both sides of the river, mainly dry and paddy fields. The river in (d) passes through residential areas.   Figures 12-14 contain more farmland, so we compare the NH3-N concentration and NDVI results to verify its relationship with agricultural activities. According to the subtropical and temperate monsoon climate in the same period of rain and heat in Xinyang City, if there is no interference from human activities, the change of NH3-N concentration is mainly affected by the amount of river water, which is higher in December to February and lower in June to August.  centration was low in January, May, August, and November. The NH3-N concentration in this area did not show obvious seasonal variation and was mainly affected by industrial wastewater discharge. Figure 12 is mainly farmland area. According to the land use data, the area is mainly dry land, and the main crop is wheat. It can be seen that during the growth period of wheat from January to April and the maturity period of wheat in July, the concentration of NH3-H is higher due to the use of fertilizer.  The area in Figure 13 is mainly a paddy field, planting crops for rice. During the rice growing season from February to July, the NH3-N concentration in the river was higher. centration was low in January, May, August, and November. The NH3-N concentration in this area did not show obvious seasonal variation and was mainly affected by industrial wastewater discharge. Figure 12 is mainly farmland area. According to the land use data, the area is mainly dry land, and the main crop is wheat. It can be seen that during the growth period of wheat from January to April and the maturity period of wheat in July, the concentration of NH3-H is higher due to the use of fertilizer.  The area in Figure 13 is mainly a paddy field, planting crops for rice. During the rice growing season from February to July, the NH3-N concentration in the river was higher. In Figure 11, the river mainly flows through industrial production areas, where NH 3 -N concentrations are highest in February, March, September, and October. NH 3 -N concentration was low in January, May, August, and November. The NH 3 -N concentration in this area did not show obvious seasonal variation and was mainly affected by industrial wastewater discharge. Figure 12 is mainly farmland area. According to the land use data, the area is mainly dry land, and the main crop is wheat. It can be seen that during the growth period of wheat from January to April and the maturity period of wheat in July, the concentration of NH 3 -H is higher due to the use of fertilizer.
The area in Figure 13 is mainly a paddy field, planting crops for rice. During the rice growing season from February to July, the NH 3 -N concentration in the river was higher. Figure 14 shows the mixed area of residential land and farmland. The annual variation of NH3-N concentration in this area is relatively gentle, and there is no obvious seasonal variation or agricultural production law, which belongs to the area affected by many factors.  Figures 12 and 13 show the partial interception of Huaihe River, which is the largest river in the Xinyang area with a large water volume and fast flow velocity. Due to the uneven distribution of water quality and the difference in flow velocity, the difference in NH3-N concentration at the edge and center of the river is obvious.
Overall, during the rainy season, the NH3-N concentration was diluted by rainwater, and the overall concentration was lower than in other periods; the concentration of NH3-N was the highest in the dry season, and the concentration changed gently. In addition to the impact of human activities, during the agricultural production period from January to August, pollution, such as chemical fertilizers in farmland, will lead to the increase in NH3-N concentration in water, which corresponds to the research results of other scholars [20]. The irregular high concentration of NH3-N in cities is mainly caused by industrial wastewater discharge.

Conclusions
In this study, an improved SR-FSDAF Spatio-temporal fusion model was proposed and applied to monitor NH3-N concentration in small and medium inland waters (Xinyang section of Huaihe River Basin). We studied the relationship between the field NH3-N data and the fused image band. The research shows that SR-FSDAF provides an effective monitoring method to improve the monitoring frequency and maintain the accuracy of water quality prediction and has great application potential in quantitative remote sensing of water quality. The random forest model constructed in this paper can be used as a high-precision and efficient method for water quality prediction in the Xinyang section of the Huaihe River Basin and provide data support for water quality monitoring and water pollution control in the Huaihe River Basin.
Although the SR-FSDAF model achieves a better fusion effect, its prediction accuracy for heterogeneous mutation is still not ideal, mainly because the spectral details of MODIS images are very limited. At the same time, whether the SR-FSDAF model has achieved  Figure 14 shows the mixed area of residential land and farmland. The annual variation of NH 3 -N concentration in this area is relatively gentle, and there is no obvious seasonal variation or agricultural production law, which belongs to the area affected by many factors. Figures 12 and 13 show the partial interception of Huaihe River, which is the largest river in the Xinyang area with a large water volume and fast flow velocity. Due to the uneven distribution of water quality and the difference in flow velocity, the difference in NH 3 -N concentration at the edge and center of the river is obvious.
Overall, during the rainy season, the NH 3 -N concentration was diluted by rainwater, and the overall concentration was lower than in other periods; the concentration of NH 3 -N was the highest in the dry season, and the concentration changed gently. In addition to the impact of human activities, during the agricultural production period from January to August, pollution, such as chemical fertilizers in farmland, will lead to the increase in NH 3 -N concentration in water, which corresponds to the research results of other scholars [20]. The irregular high concentration of NH 3 -N in cities is mainly caused by industrial wastewater discharge.

Conclusions
In this study, an improved SR-FSDAF Spatio-temporal fusion model was proposed and applied to monitor NH 3 -N concentration in small and medium inland waters (Xinyang section of Huaihe River Basin). We studied the relationship between the field NH 3 -N data and the fused image band. The research shows that SR-FSDAF provides an effective monitoring method to improve the monitoring frequency and maintain the accuracy of water quality prediction and has great application potential in quantitative remote sensing of water quality. The random forest model constructed in this paper can be used as a high-precision and efficient method for water quality prediction in the Xinyang section of the Huaihe River Basin and provide data support for water quality monitoring and water pollution control in the Huaihe River Basin.
Although the SR-FSDAF model achieves a better fusion effect, its prediction accuracy for heterogeneous mutation is still not ideal, mainly because the spectral details of MODIS images are very limited. At the same time, whether the SR-FSDAF model has achieved similarly good results in the fusion of satellite images from other data sources remains to be demonstrated. In future work, we will test the performance of the model in the Spatio-temporal fusion of different images. In addition, more spectral information is obtained from multiple dimensions by using learning methods to improve the prediction of heterogeneous mutations. For NH 3 -N remote sensing monitoring, more work is needed to prove the adaptability of the random forest model to different regions.