Water Quality Chl-a Inversion Based on Spatio-Temporal Fusion and Convolutional Neural Network

: The combination of remote sensing technology and traditional ﬁeld sampling provides a convenient way to monitor inland water. However, limited by the resolution of remote sensing images and cloud contamination, the current water quality inversion products do not provide both high temporal resolution and high spatial resolution. By using the spatio-temporal fusion (STF) method, high spatial resolution and temporal fusion images were generated with Landsat, Sentinel-2, and GaoFen-2 data. Then, a Chl-a inversion model was designed based on a convolutional neural network (CNN) with the structure of 4-(136-236-340)-1-1. Finally, the results of the Chl-a concentrations were corrected using a pixel correction algorithm. The images generated from STF can maintain the spectral characteristics of the low-resolution images with the R 2 between 0.7 and 0.9. The Chl-a inversion results based on the spatio-temporal fused images and CNN were veriﬁed with measured data (R 2 = 0.803), and then the results were improved (R 2 = 0.879) after further combining them with the pixel correction algorithm. The correlation R 2 between the Chl-a results of GF2-like and Sentinel-2 were both greater than 0.8. The differences in the spatial distribution of Chl-a concentrations in the BYD lake gradually increased from July to August. Remote sensing water quality inversion based on STF and CNN can effectively achieve high frequency in time and ﬁne resolution in space, which provide a stronger scientiﬁc basis for rapid diagnosis of eutrophication in inland lakes.


Introduction
Water quality monitoring is an important basis for water quality assessments and water pollution management. Water quality conditions directly affect public health and the economic productivity of nations [1][2][3]. Therefore, large-scale, accurate and fast water quality monitoring is particularly important [4,5]. Although on-site water quality measurements can obtain water quality parameters in detail, they are also time-consuming, labor-intensive, costly, and are easily restricted by weather and hydrological conditions, making it difficult to conduct timely, large-scale monitoring [6]. Remote sensing technology has macroscopic qualities that are well suited to the spectral characteristics of water quality monitoring at large scales [7][8][9]. Various remote sensing data and continuously updated water quality inversion models make water quality inversion a convenient means of achieving real-time monitoring; this has facilitated the emergence of new applications [10][11][12].
Traditional water quality inversion models include empirical models, semi-empirical models, and physical models. Some water quality parameters can be directly inverted by remote sensing after including optically active substances in water bodies, such as chlorophyll a (Chl-a), colored dissolved organic matter (CDOM), and suspended solids (SS) [13,14]. It is worth mentioning that Chl-a content inversion research is used a lot in Baiyangdian (BYD) is the largest shallow lake in China's Haihe River Basin, formed in the depression where the Yongding River and Hutuo River alluvial fans meet in front of the Taihang Mountains. We collected 92 water samples from BYD lake, accounting for eight oval-shaped regions (Figure 1), to calculate changes in Chl-a concentrations over time. We also selected two smaller rectangular regions (Figure 1 details areas I and II) to verify the STF results at fine scale, including the water quality correction algorithm and changes of water quality with time.

Data and Processing
Combined with the field conditions in BYD lake, the growth of phytoplankton varied greatly in August, which helped us to construct the inverse model, and we collected water samples between 21 August and 23 August 2019. We located 92 water quality sampling points using real-time kinematic (RTK) methods and measured their concentrations with EXO multiparameter sonde (a multi-parameter water quality monitor developed by YSI). The GF-2 satellite has a 4-band multispectral sensor with 4 m spatial resolution and a return period of 69 days. This means that there were only two GF-2 images (3 July and 16 August) available from July to September of 2019. GF-2 image preprocessing includes radiation calibration, FLASSH atmospheric correction, and RPC geometric correction. The Sentinel-2 satellite carries a multispectral camera that covers 13 spectral bands with a spatial resolution of 10 m and a temporal resolution of 5 days. Sentinel-2 data are open-source and available from the ESA Copernicus Data Center (Sentinel-2: https://scihub.copernicus.eu/ (accessed on 16 December 2020)). The L2A level product of Sentinel-2 can be used after preprocessing, including orthorectification, geometric precision correction, radiation correction, and atmospheric correction. Sentinel-2 data on 12 July, 27 July, 21 August, and 31 August are used as low-resolution data, Landsat-8 on 1 August and Landsat-7 on 25 August are used as supplements to low-resolution data, and STF is used to generate high spatial resolution data. The generated GF-2-like image on August 16 was compared with the original GF-2 image on 16 August to evaluate the effects of the STF method. The dates and sources for remote sensing imagery analyzed in this study are given in Table 1.

Data and Processing
Combined with the field conditions in BYD lake, the growth of phytoplankton varied greatly in August, which helped us to construct the inverse model, and we collected water samples between 21 August and 23 August 2019. We located 92 water quality sampling points using real-time kinematic (RTK) methods and measured their concentrations with EXO multiparameter sonde (a multi-parameter water quality monitor developed by YSI). The GF-2 satellite has a 4-band multispectral sensor with 4 m spatial resolution and a return period of 69 days. This means that there were only two GF-2 images (3 July and 16 August) available from July to September of 2019. GF-2 image preprocessing includes radiation calibration, FLASSH atmospheric correction, and RPC geometric correction. The Sentinel-2 satellite carries a multispectral camera that covers 13 spectral bands with a spatial resolution of 10 m and a temporal resolution of 5 days. Sentinel-2 data are open-source and available from the ESA Copernicus Data Center (Sentinel-2: https://scihub.copernicus. eu/ (accessed on 16 December 2020)). The L2A level product of Sentinel-2 can be used after preprocessing, including orthorectification, geometric precision correction, radiation correction, and atmospheric correction. Sentinel-2 data on 12 July, 27 July, 21 August, and 31 August are used as low-resolution data, Landsat-8 on 1 August and Landsat-7 on 25 August are used as supplements to low-resolution data, and STF is used to generate high spatial resolution data. The generated GF-2-like image on August 16 was compared with the original GF-2 image on 16 August to evaluate the effects of the STF method. The dates and sources for remote sensing imagery analyzed in this study are given in Table 1. Table 1. Images for inversion between July and August (" √ ": Available for download; "/": Too much cloud coverage or lack of images).

Sensors
The Date of the Image Covering BYD Lake from July to August (2019) The flow of this paper is shown in Figure 2. In this study, firstly, GF2-like images of the sampling period were generated using spatio-temporal fusion, and a Chl-a inversion model based on a convolutional neural network was established using the sampling points and images as input data. Then, the Sentinel-2 sequence was supplemented using the image-matching algorithm. Spatial and temporal fusion and water quality inversions were performed on multi-period Sentinel-2 data to generate preliminary water quality results. Finally, the corrected GF2-like-based Chl-a concentration time series were generated using the pixel correction algorithm.

Spatial and Temporal Fusion Method
The production of GF2-like images (GF-2 images generated from contemporaneous Sentinel-2 images via temporal fusion) based on the spatio-temporal fusion method (FSDAF) [33] is divided into 5 steps ( Figure 3): (1) classify the GF-2 image at T1 (time 1); (2) classify the Sentinel-2 images of T1 and T2 (time 2), and estimate the time difference between the corresponding classes of the Sentinel-2 images of T1 and T2; (3) predict the GF-2 image classification results at T2 based on the above time differences and calculate the residuals between GF-2 and the Sentinel-2 classification results at the same time; (4) use inverse distance weighted (IDW) interpolation to predict the corresponding GF-2 image based on the Sentinel-2 image at T2; (5) the residual value is assigned to the GF-2 image predicted at time T2 by using the inverse distance weighted interpolation function, and the weight is given by combining the neighborhood information of pixels to correct the accuracy. Finally, the GF2-like image at time T2 is fused and generated.

Spatial and Temporal Fusion Method
The production of GF2-like images (GF-2 images generated from contemporaneous Sentinel-2 images via temporal fusion) based on the spatio-temporal fusion method (FS-DAF) [33] is divided into 5 steps ( Figure 3): (1) classify the GF-2 image at T1 (time 1); (2) classify the Sentinel-2 images of T1 and T2 (time 2), and estimate the time difference between the corresponding classes of the Sentinel-2 images of T1 and T2; (3) predict the GF-2 image classification results at T2 based on the above time differences and calculate the residuals between GF-2 and the Sentinel-2 classification results at the same time; (4) use inverse distance weighted (IDW) interpolation to predict the corresponding GF-2 image based on the Sentinel-2 image at T2; (5) the residual value is assigned to the GF-2 image predicted at time T2 by using the inverse distance weighted interpolation function, and the weight is given by combining the neighborhood information of pixels to correct the accuracy. Finally, the GF2-like image at time T2 is fused and generated. Remote Sens. 2022, 14, x FOR PEER REVIEW 5 of 20 The calculation process for step 5 is as follows [54]: , , = , , + × ( , , ) where ( , , ) indicates the change of pixel spatial resolution between T1 and T2, is the weight of the k-th similar pixel, ( , , ) indicates the residuals of the i-th low spatial resolution pixel assigned to the j-th high spatial resolution pixel, and ( , ) represents the change in low spatial resolution pixels from T1 to T2 for different classifications (a).
Using paired data values (GF-2 data on 3 July, 2019, and Sentinel-2 data on 2 July, 2019) as inputs, we generated four GF2-like images using four Sentinel-2 images on 12 July, 16 August, 21 August, and 25 September. This calculation required about 204 h of computing time using a computer with an Intel Core i7-10700K and 64 G memory.

Image-Matching Algorithm
In order to apply more remote sensing data to the inversion model, we propose a simplified version of the data conversion method with reference to the principles of the spatio-temporal fusion algorithm [30][31][32]. We generated Sentinel-2 images using Landsat images. The core idea of this method is to unify the DN value range of the two input data images, calculate the residual of the known image pair, and then superimpose the residual on the image to be converted.
The data conversion algorithm uses Landsat at T1 (M), Landsat at T2 (N), and Sentinel-2 at T1 (X) to generate a Sentinel-2 at T2 (Y), and is divided into the following four steps: 1. All image preprocessing is equally sampled at a 10 m resolution, and the 5th percentile and 95th percentile of each band of the three input images are counted as , , , , , and , respectively. The DN range of the target image X is converted into the original image M: 2. The residual of ′ and M is calculated and superimposed on N: The calculation process for step 5 is as follows [54]: where ∆R high x ij , y ij , b indicates the change of pixel spatial resolution between T1 and T2, ω k is the weight of the k-th similar pixel, ε high x ij , y ij , b indicates the residuals of the i-th low spatial resolution pixel assigned to the j-th high spatial resolution pixel, and ∆R high (a, b) represents the change in low spatial resolution pixels from T1 to T2 for different classifications (a). R high2 x ij , y ij , b indicates the predicted GF2-like data at T2 time, and R high1 x ij , y ij , b indicates a high-resolution image input at T1. Using paired data values (GF-2 data on 3 July, 2019, and Sentinel-2 data on 2 July, 2019) as inputs, we generated four GF2-like images using four Sentinel-2 images on 12 July, 16 August, 21 August, and 25 September. This calculation required about 204 h of computing time using a computer with an Intel Core i7-10700K and 64 G memory.

Image-Matching Algorithm
In order to apply more remote sensing data to the inversion model, we propose a simplified version of the data conversion method with reference to the principles of the spatio-temporal fusion algorithm [30][31][32]. We generated Sentinel-2 images using Landsat images. The core idea of this method is to unify the DN value range of the two input data images, calculate the residual of the known image pair, and then superimpose the residual on the image to be converted.
The data conversion algorithm uses Landsat at T1 (M), Landsat at T2 (N), and Sentinel-2 at T1 (X) to generate a Sentinel-2 at T2 (Y), and is divided into the following four steps:

1.
All image preprocessing is equally sampled at a 10 m resolution, and the 5th percentile and 95th percentile of each band of the three input images are counted as m 5 , m 95 , n 5 , n 95 , x 5 , and x 95 , respectively. The DN range of the target image X is converted into the original image M: 2.
The residual of X and M is calculated and superimposed on N: 3.
To determine the DN range of the target image, calculate the 5th percentile and 95th percentile of the image Y, denoted as y 5 , y 95 : where k is the conversion factor: where k is the conversion factor: 4. Convert ′ to Y:

Chl-a Inversion Model Based on CNN
We chose 92 sampling points and their corresponding grid pixels from the GF-2-like image taken on 21 August 2019 as the training input to design a Chl-a inversion model based on CNN [55]. In order to decrease the uncertainties of pixels, 5 × 5 grid pixel matrix blocks were used, and the central pixel corresponds to each sampling point position. Figure 5 shows the structure of the CNN. In this paper, the CNN structure is 4-(LY1-LY2-LY3)-1-1, with four input bands, three hidden layers, one fully connected layer, and one regression layer in that order.
To make the model training more adequate, 92 matrix blocks were extended using a rotation (0°, 90°, 180°, and 270° angles) and flipping of the four rotated blocks. Figure 5 illustrates the method of matrix expansion. The original matrix blocks and their derived expanded matrix blocks correspond to the same water body Chl-a concentrations. The final generated 736 sets of data were used together as a data set for the training of the CNN regression model.

Chl-a Inversion Model Based on CNN
We chose 92 sampling points and their corresponding grid pixels from the GF-2-like image taken on 21 August 2019 as the training input to design a Chl-a inversion model based on CNN [55]. In order to decrease the uncertainties of pixels, 5 × 5 grid pixel matrix blocks were used, and the central pixel corresponds to each sampling point position. Figure 5 shows the structure of the CNN. In this paper, the CNN structure is 4-(LY1-LY2-LY3)-1-1, with four input bands, three hidden layers, one fully connected layer, and one regression layer in that order.
To make the model training more adequate, 92 matrix blocks were extended using a rotation (0 • , 90 • , 180 • , and 270 • angles) and flipping of the four rotated blocks. Figure 5 illustrates the method of matrix expansion. The original matrix blocks and their derived expanded matrix blocks correspond to the same water body Chl-a concentrations. The final generated 736 sets of data were used together as a data set for the training of the CNN regression model.  The number of convolutional kernels in the model is incremented to obtain deeper features. To determine the optimal number of convolutional layers, the number of convolutional kernels in the implicit layer, as well as the regularization parameters, the number of iterations, and the learning rate, are adjusted using an iterative algorithm to determine the best combination of parameters to improve the accuracy of the model. Here, 80% of the data was randomly extracted for the training samples and 20% of the data was used for the verification samples for each training session to analyze the accuracy of inversion. In this paper, the R 2 , RMSE, and average difference (AD) values were calculated to evaluate the true accuracy for the comparison of results.

Pixel Correction Algorithm
Due to the fusion error of the spatio-temporal fusion image, the inversion result based on the fusion image (GF2-like) may be different from the inversion result based on the Sentinel-2 image. Therefore, a water quality pixel correction algorithm based on pixel decomposition was proposed. Figure 6 illustrates the correction algorithm for the water quality pixel designed in this study. The pixel correction algorithm was divided into the following 4 steps: 1. The unified up-sampling of high-and low-resolution Chl-a results of the same period to 2 m resolution is performed, and each set of 5 × 5 low-resolution pixels was homogenized to generate the corresponding confused pixel Y, whose expression is: where is the low-resolution image pixel after resampling.
2. The scale factor k is calculated for each pair of corrected pixel, and the expression is: The number of convolutional kernels in the model is incremented to obtain deeper features. To determine the optimal number of convolutional layers, the number of convolutional kernels in the implicit layer, as well as the regularization parameters, the number of iterations, and the learning rate, are adjusted using an iterative algorithm to determine the best combination of parameters to improve the accuracy of the model. Here, 80% of the data was randomly extracted for the training samples and 20% of the data was used for the verification samples for each training session to analyze the accuracy of inversion. In this paper, the R 2 , RMSE, and average difference (AD) values were calculated to evaluate the true accuracy for the comparison of results.

Pixel Correction Algorithm
Due to the fusion error of the spatio-temporal fusion image, the inversion result based on the fusion image (GF2-like) may be different from the inversion result based on the Sentinel-2 image. Therefore, a water quality pixel correction algorithm based on pixel decomposition was proposed. Figure 6 illustrates the correction algorithm for the water quality pixel designed in this study. The number of convolutional kernels in the model is incremented to obtain deeper features. To determine the optimal number of convolutional layers, the number of convolutional kernels in the implicit layer, as well as the regularization parameters, the number of iterations, and the learning rate, are adjusted using an iterative algorithm to determine the best combination of parameters to improve the accuracy of the model. Here, 80% of the data was randomly extracted for the training samples and 20% of the data was used for the verification samples for each training session to analyze the accuracy of inversion. In this paper, the R 2 , RMSE, and average difference (AD) values were calculated to evaluate the true accuracy for the comparison of results.

Pixel Correction Algorithm
Due to the fusion error of the spatio-temporal fusion image, the inversion result based on the fusion image (GF2-like) may be different from the inversion result based on the Sentinel-2 image. Therefore, a water quality pixel correction algorithm based on pixel decomposition was proposed. Figure 6 illustrates the correction algorithm for the water quality pixel designed in this study. The pixel correction algorithm was divided into the following 4 steps: 1. The unified up-sampling of high-and low-resolution Chl-a results of the same period to 2 m resolution is performed, and each set of 5 × 5 low-resolution pixels was homogenized to generate the corresponding confused pixel Y, whose expression is: where is the low-resolution image pixel after resampling.
2. The scale factor k is calculated for each pair of corrected pixel, and the expression is: The pixel correction algorithm was divided into the following 4 steps: 1.
The unified up-sampling of high-and low-resolution Chl-a results of the same period to 2 m resolution is performed, and each set of 5 × 5 low-resolution pixels was homogenized to generate the corresponding confused pixel Y, whose expression is: where W ij is the low-resolution image pixel after resampling. 2.
The scale factor k is calculated for each pair of corrected pixel, and the expression is: where X ij in the formula is the correction before the image pixel, and Y is the corresponding low-resolution image pixel.

3.
Calculate each modified image pixel Z ij of the group according to the scale factor, and the expression is: where Z ij is the corrected image pixel. Figure 7 shows the spatio-temporal fusion (STF) result for 16 August 2019. The generated GF2-like image indicated a similar spatial distribution compared to the sentinel-2 image. Additionally, the fusion image had a higher spatial resolution with more detailed information, it makes the boundary between the lake and the land clearer. The downside is that the color of the fused image differs from Sentinel-2. where in the formula is the correction before the image pixel, and Y is the corresponding low-resolution image pixel.

Spatio-Temporal Fusion Result
3. Calculate each modified image pixel of the group according to the scale factor, and the expression is: where is the corrected image pixel. Figure 7 shows the spatio-temporal fusion (STF) result for 16 August 2019. The generated GF2-like image indicated a similar spatial distribution compared to the sentinel-2 image. Additionally, the fusion image had a higher spatial resolution with more detailed information, it makes the boundary between the lake and the land clearer. The downside is that the color of the fused image differs from Sentinel-2.  Figure 8 shows the distribution of the 1000 sampled points for the GF-2 and GF-2like images for each spectral band. We compared the spectral analysis of the GF2-like image with GF-2 images for 16 August 2019 and found that their spectral compositions were similar in the blue, green, and red and NIR band, respectively. Generally, the DN values of the GF2-like image were lower than those of the GF-2 image, and the GF2-like image had more outliers, which were caused by fusion distortion. However, the size distribution trends among the four bands were consistent with the GF-2 image, indicating that the fused data inherited some interrelationships from the original bands.  Figure 8 shows the distribution of the 1000 sampled points for the GF-2 and GF-2-like images for each spectral band. We compared the spectral analysis of the GF2-like image with GF-2 images for 16 August 2019 and found that their spectral compositions were similar in the blue, green, and red and NIR band, respectively. Generally, the DN values of the GF2-like image were lower than those of the GF-2 image, and the GF2-like image had more outliers, which were caused by fusion distortion. However, the size distribution trends among the four bands were consistent with the GF-2 image, indicating that the fused data inherited some interrelationships from the original bands.  Figure 9 shows the scatter plot of the DN (digital number) range of different bands based on the GF2-like image and GF-2 image for 16 August 2019. Benefiting from a larger DN range, the correlation in the NIR band was the best with an R 2 of 0.8812, and the correlation in the visible band was generally lower than that in the NIR band. The differences between the bands of fused GF2-like images and the GF-2 image were kept roughly within the absolute errors around −0.02. Consistent with the observation of the box line plots, the DN of GF2-like images is slightly lower than that of Sentinel-2.  Figure 10 shows the corresponding correlation level when the training input data are expanded (i.e., when we rotated and flipped the training data to increase the number; see   Figure 9 shows the scatter plot of the DN (digital number) range of different bands based on the GF2-like image and GF-2 image for 16 August 2019. Benefiting from a larger DN range, the correlation in the NIR band was the best with an R 2 of 0.8812, and the correlation in the visible band was generally lower than that in the NIR band. The differences between the bands of fused GF2-like images and the GF-2 image were kept roughly within the absolute errors around −0.02. Consistent with the observation of the box line plots, the DN of GF2-like images is slightly lower than that of Sentinel-2.  Figure 10 shows the corresponding correlation level when the training input data are expanded (i.e., when we rotated and flipped the training data to increase the number; see  Figure 10 shows the corresponding correlation level when the training input data are expanded (i.e., when we rotated and flipped the training data to increase the number; see Figure 5) or unexpanded. When the model uses extended data, the average correlation level increases by about 0.1, the correlation degree is generally greater than 0.7, and the highest can reach 0.803.

Chl-a Inversion Results
Remote Sens. 2022, 14, x FOR PEER REVIEW 10 of 20 Figure 5) or unexpanded. When the model uses extended data, the average correlation level increases by about 0.1, the correlation degree is generally greater than 0.7, and the highest can reach 0.803.   Figure 11b-d shows the detailed Chl-a concentration in region II. The Chl-a inversion results based on the fused images had a higher spatial resolution, but the spatial details were slightly different from those of the low-resolution Chl-a inversion results. After the image pixel correction, the Chl-a inversion results based on the fused image can maintain the same spatial relationship as the low-resolution Chl-a inversion results. The correlation of validation points improved from 0.803 to 0.879, the root-meansquare error decreased from 3.359 to 2.702, and the mean difference decreased from 0.513 to 0.328, which indicated that the image pixel correction was effective.    Figure 11b-d shows the detailed Chl-a concentration in region II. The Chl-a inversion results based on the fused images had a higher spatial resolution, but the spatial details were slightly different from those of the low-resolution Chl-a inversion results. After the image pixel correction, the Chl-a inversion results based on the fused image can maintain the same spatial relationship as the low-resolution Chl-a inversion results. The correlation of validation points improved from 0.803 to 0.879, the root-mean-square error decreased from 3.359 to 2.702, and the mean difference decreased from 0.513 to 0.328, which indicated that the image pixel correction was effective.    Figure 11b-d shows the detailed Chl-a concentration in region II. The Chl-a inversion results based on the fused images had a higher spatial resolution, but the spatial details were slightly different from those of the low-resolution Chl-a inversion results. After the image pixel correction, the Chl-a inversion results based on the fused image can maintain the same spatial relationship as the low-resolution Chl-a inversion results. The correlation of validation points improved from 0.803 to 0.879, the root-meansquare error decreased from 3.359 to 2.702, and the mean difference decreased from 0.513 to 0.328, which indicated that the image pixel correction was effective.   Figure 12 shows the detailed information, including the image STF result and retrieved Chl-a in regions (I) and (II) for 21 August 2019. The resolution of the GF2-like image was obviously higher than that of the Sentinel-2 image. Comparing the Chl-a results from the Sentinel-2 and GF2-like images, both express well the spatial distribution of Chl-a, while the GF2-like images (Figure 12(a4,b4)) are richer in detail. The images in Figure 12(a4,b4) are smoother than the rough pixel values in Figure 12(a3,b3), providing a more detailed spatial characterization of the Chl-a concentrations. We can clearly see that the darker the water surface color in Figure 12(a1,a2), the higher the Chl-a concentration in Figure 12(a3,a4).
Remote Sens. 2022, 14, x FOR PEER REVIEW 11 of 20 Figure 12 shows the detailed information, including the image STF result and retrieved Chl-a in regions (I) and (II) for 21 August 2019. The resolution of the GF2-like image was obviously higher than that of the Sentinel-2 image. Comparing the Chl-a results from the Sentinel-2 and GF2-like images, both express well the spatial distribution of Chl-a, while the GF2-like images (Figure 12(a4, b4)) are richer in detail. The images in Figure 12(a4, b4) are smoother than the rough pixel values in Figure 12(a3, b3), providing a more detailed spatial characterization of the Chl-a concentrations. We can clearly see that the darker the water surface color in Figure 12(a1, a2), the higher the Chl-a concentration in Figure 12(a3, a4).  Figure 13 shows the correlation curve for the Chl-a inversion results of the Sentinel-2 images and GF2-like images. It can be seen that all inversion results had a higher correlation (R 2 > 0.81) with good reliability. The difference between STF images and lowresolution images ranges from 0 to 6 μg/L. The sample points are shifted to the upper right corner on 21 August, 25 August, and 31 August, indicating a significant increase in Chl-a during this period.  Figure 13 shows the correlation curve for the Chl-a inversion results of the Sentinel-2 images and GF2-like images. It can be seen that all inversion results had a higher correlation (R 2 > 0.81) with good reliability. The difference between STF images and low-resolution images ranges from 0 to 6 µg/L. The sample points are shifted to the upper right corner on 21 August, 25 August, and 31 August, indicating a significant increase in Chl-a during this period. Remote Sens. 2022, 14, x FOR PEER REVIEW 12 of 20 Figure 13. Correlation of Chl-a time series based on GF2-like images and Sentinel-2 images. Figure 14 clearly represents the difference in spatial detail between the Chl-a inversion results of multi-phase Sentinel-2 and GF-2 images, and we have chosen region II for comparison. On this time series, we can see an overall increasing trend of Chl-a concentrations over time, with a large variation in phytoplankton growth during August, and an opposite increasing trend from 21 August to 25 August. The Chl-a concentrations in some periods deviated from the overall pattern, for example, the concentrations on 2 July and 21 August were higher than the normal trend. The overall spatial distribution of GF2-like images was in good agreement with that of Sentinel-2, while it was richer in spatial details. However, there are still local differences, although they are few.   Figure 14 clearly represents the difference in spatial detail between the Chl-a inversion results of multi-phase Sentinel-2 and GF-2 images, and we have chosen region II for comparison. On this time series, we can see an overall increasing trend of Chl-a concentrations over time, with a large variation in phytoplankton growth during August, and an opposite increasing trend from 21 August to 25 August. The Chl-a concentrations in some periods deviated from the overall pattern, for example, the concentrations on 2 July and 21 August were higher than the normal trend. The overall spatial distribution of GF2-like images was in good agreement with that of Sentinel-2, while it was richer in spatial details. However, there are still local differences, although they are few.  Figure 14 clearly represents the difference in spatial detail between the Chl-a inversion results of multi-phase Sentinel-2 and GF-2 images, and we have chosen region II for comparison. On this time series, we can see an overall increasing trend of Chl-a concentrations over time, with a large variation in phytoplankton growth during August, and an opposite increasing trend from 21 August to 25 August. The Chl-a concentrations in some periods deviated from the overall pattern, for example, the concentrations on 2 July and 21 August were higher than the normal trend. The overall spatial distribution of GF2-like images was in good agreement with that of Sentinel-2, while it was richer in spatial details. However, there are still local differences, although they are few.   Figure 15 shows the water quality results of BYD lake on 2 July, 12 July, 27 July, 1 August, 16 August, 21 August, 25 August, and 31 August of 2019. Chl-a concentrations were retrieved using the trained neural network model and pixel correction algorithm. From July to August, Chl-a concentrations decreased and then increased, remaining below 20 µg/L in most areas. On 31 August, Chl-a concentrations reached 20 µg/L in most areas. There were some areas with small changes in chlorophyll concentration over time, while there were some adjacent areas with opposite trends in chlorophyll concentrations. The periods with large geographic differences in Chl-a concentrations were 3 July, 21 August, 25 August, and 31 August, with a relatively stable growth trend during July and a faster growth change during August. In addition, we found that the change trend of the inversion results of Sentinel-2 data generated by Landsat is consistent with that of real Sentinel-2 data, indicating that the data supplement is feasible.

Comparison of Multi-Period Results
Remote Sens. 2022, 14, x FOR PEER REVIEW 13 of 20 Figure 15 shows the water quality results of BYD lake on 2 July, 12 July, 27 July, 1 August, 16 August, 21 August, 25 August, and 31 August of 2019. Chl-a concentrations were retrieved using the trained neural network model and pixel correction algorithm. From July to August, Chl-a concentrations decreased and then increased, remaining below 20 μg/L in most areas. On 31 August, Chl-a concentrations reached 20 μg/L in most areas. There were some areas with small changes in chlorophyll concentration over time, while there were some adjacent areas with opposite trends in chlorophyll concentrations. The periods with large geographic differences in Chl-a concentrations were 3 July, 21 August, 25 August, and 31 August, with a relatively stable growth trend during July and a faster growth change during August. In addition, we found that the change trend of the inversion results of Sentinel-2 data generated by Landsat is consistent with that of real Sentinel-2 data, indicating that the data supplement is feasible.  Figure 16 shows the comparison of Chl-a concentrations in the eight areas of BYD lake ( Figure 1). We plotted the mean and range of 1000 points from each area for eight consecutive dates. We found that there was a slight decrease in Chl-a concentration in BYD lake during July, and the regional differences gradually decreased. During August, Chl-a concentrations increased at different degrees in all regions, and the range of values in the region became wider, indicating that there was a great variation in algal growth in August, and algal outbreaks occurred locally in BYD lake. Overall, the Chl-a concentration in BYD lake was greater in August than in July, with the highest value greater than 30 μg/L throughout August and a high degree of eutrophication, and the lowest Chl-a concentration in late July, with concentrations ranging from 10 to 15 μg/L.  Figure 16 shows the comparison of Chl-a concentrations in the eight areas of BYD lake ( Figure 1). We plotted the mean and range of 1000 points from each area for eight consecutive dates. We found that there was a slight decrease in Chl-a concentration in BYD lake during July, and the regional differences gradually decreased. During August, Chl-a concentrations increased at different degrees in all regions, and the range of values in the region became wider, indicating that there was a great variation in algal growth in August, and algal outbreaks occurred locally in BYD lake. Overall, the Chl-a concentration in BYD lake was greater in August than in July, with the highest value greater than 30 µg/L throughout August and a high degree of eutrophication, and the lowest Chl-a concentration in late July, with concentrations ranging from 10 to 15 µg/L.

Sensitivity of Chl-a Inversion Model Based on CNN Structure
Many scholars have tried to construct water quality inversion models with CNN and achieved good results, for example, Maier et al. used a 1D CNN for Chl-a simulation training [56], and Aptoula et al. used a 2D CNN and a 3D CNN for Chl-a simulation training, where the R 2 reached 0.93 [57]. We used a 2D CNN for training in our experiments, and Figure 17 shows the effect of the number of convolutional kernels per layer on the effect of the model, and the accuracy curves for the first three iterations are plotted in the figure. We found that as the number of iterations increased, the accuracy of the first and second layers improved significantly, while the accuracy of the third layer improved less. Pu et al. concluded that the optimal number of convolutional kernel layers for inverse water quality grading in convolutional neural networks is two to seven layers [55], which is consistent with our observation. We finally determined the CNN structure as 4-(136-236-340)-1-1, the learning rate was set to 0.001, the number of iterations was set to 100, and the regularization parameter was set to 0.001. Its correlation coefficient R 2 reached 0.803. This accuracy level is normal because the wide spectral bandwidth of GF-2 images weakened the spectral features [58]. We trained with Sentinel-2 images under CNN and found that its best correlation coefficient R 2 reached 0.887, which is similar to the accuracy level of the related study [59].

Sensitivity of Chl-a Inversion Model Based on CNN Structure
Many scholars have tried to construct water quality inversion models with CNN and achieved good results, for example, Maier et al. used a 1D CNN for Chl-a simulation training [56], and Aptoula et al. used a 2D CNN and a 3D CNN for Chl-a simulation training, where the R 2 reached 0.93 [57]. We used a 2D CNN for training in our experiments, and Figure 17 shows the effect of the number of convolutional kernels per layer on the effect of the model, and the accuracy curves for the first three iterations are plotted in the figure. We found that as the number of iterations increased, the accuracy of the first and second layers improved significantly, while the accuracy of the third layer improved less. Pu et al. concluded that the optimal number of convolutional kernel layers for inverse water quality grading in convolutional neural networks is two to seven layers [55], which is consistent with our observation. We finally determined the CNN structure as 4-(136-236-340)-1-1, the learning rate was set to 0.001, the number of iterations was set to 100, and the regularization parameter was set to 0.001. Its correlation coefficient R 2 reached 0.803. This accuracy level is normal because the wide spectral bandwidth of GF-2 images weakened the spectral features [58]. We trained with Sentinel-2 images under CNN and found that its best correlation coefficient R 2 reached 0.887, which is similar to the accuracy level of the related study [59].

Sensitivity of Chl-a Inversion Model Based on CNN Structure
Many scholars have tried to construct water quality inversion models with CNN and achieved good results, for example, Maier et al. used a 1D CNN for Chl-a simulation training [56], and Aptoula et al. used a 2D CNN and a 3D CNN for Chl-a simulation training, where the R 2 reached 0.93 [57]. We used a 2D CNN for training in our experiments, and Figure 17 shows the effect of the number of convolutional kernels per layer on the effect of the model, and the accuracy curves for the first three iterations are plotted in the figure. We found that as the number of iterations increased, the accuracy of the first and second layers improved significantly, while the accuracy of the third layer improved less. Pu et al. concluded that the optimal number of convolutional kernel layers for inverse water quality grading in convolutional neural networks is two to seven layers [55], which is consistent with our observation. We finally determined the CNN structure as 4-(136-236-340)-1-1, the learning rate was set to 0.001, the number of iterations was set to 100, and the regularization parameter was set to 0.001. Its correlation coefficient R 2 reached 0.803. This accuracy level is normal because the wide spectral bandwidth of GF-2 images weakened the spectral features [58]. We trained with Sentinel-2 images under CNN and found that its best correlation coefficient R 2 reached 0.887, which is similar to the accuracy level of the related study [59].

Advantages
The CNN-based Chl-a inversion model can effectively reflect the rich spatial information of the fused images and obtain high spatial resolution Chl-a inversion results [60]. Figure 18 shows the correlation level between BP (backpropagation) and the CNN, and the accuracy of the CNN is higher than BP when there are enough samples [61]. The CNN takes full advantage of the high-resolution features of the fused images and reduces the unexpected errors caused by the small field of view of traditional inversion methods [38,62].

Advantages
The CNN-based Chl-a inversion model can effectively reflect the rich spatial information of the fused images and obtain high spatial resolution Chl-a inversion results [60]. Figure 18 shows the correlation level between BP (backpropagation) and the CNN, and the accuracy of the CNN is higher than BP when there are enough samples [61]. The CNN takes full advantage of the high-resolution features of the fused images and reduces the unexpected errors caused by the small field of view of traditional inversion methods [38,62]. In pursuit of data reliability and accessibility, scholars have often used open-source data such as Sentinel-2 (10 m), Landsat series (15 m), and Modis (500 m) for remote sensing inversion [63][64][65]. Lin et al. found that the area of BYD lake in flood season can reach 290 km 2 [66], and it is difficult to achieve remote sensing inversion with a large range, a high frequency, and a high spatial resolution. Yang et al. used STF for the inversion of surface temperature in Zhengzhou City [54], and we applied STF to the inversion of Chl-a with the same good results. A similar study was conducted by Liu et al. who used Sentinel-2, Landsat-8, and Modis for spatio-temporal fusion and SPM (suspended particulate matter) inversion [67]. Compared to this author, we proposed image-matching and pixel correction algorithms to further complement the time series and improve the accuracy of the inversion. Related data show that Baiyangdian has abundant precipitation during the flood season from July to September [68], and some scholars [69] found that precipitation can cause the short-term growth of Chl-a, which is consistent with our observation.

Limitations
The combination of STF and CNN has been well applied in Chl-a inversion. This method can also be applied to any area with sufficient data for water quality sampling points, but it still has some limitations. For example, our study has only 92 sample points; in the study of Aptoula, E. et al., the number of sample points reached 320 and their R 2 was higher than ours [57]. Although we increased the sample points via data expansion, more measured sample points could lead to higher accuracy [70]. Limited by the current sensor development, there are fewer satellites with both high spatial resolution and good spectral characteristics [71]. Therefore, in order to obtain better accuracy for water quality inversion, fusion experiments with hyperspectral satellites and UAVs can be further investigated [58,72]. Since many scholars have improved the fusion algorithm, further research can be conducted using better STF methods, such as EFSDAF [73]. In pursuit of data reliability and accessibility, scholars have often used open-source data such as Sentinel-2 (10 m), Landsat series (15 m), and Modis (500 m) for remote sensing inversion [63][64][65]. Lin et al. found that the area of BYD lake in flood season can reach 290 km 2 [66], and it is difficult to achieve remote sensing inversion with a large range, a high frequency, and a high spatial resolution. Yang et al. used STF for the inversion of surface temperature in Zhengzhou City [54], and we applied STF to the inversion of Chl-a with the same good results. A similar study was conducted by Liu et al. who used Sentinel-2, Landsat-8, and Modis for spatio-temporal fusion and SPM (suspended particulate matter) inversion [67]. Compared to this author, we proposed image-matching and pixel correction algorithms to further complement the time series and improve the accuracy of the inversion. Related data show that Baiyangdian has abundant precipitation during the flood season from July to September [68], and some scholars [69] found that precipitation can cause the short-term growth of Chl-a, which is consistent with our observation.

Limitations
The combination of STF and CNN has been well applied in Chl-a inversion. This method can also be applied to any area with sufficient data for water quality sampling points, but it still has some limitations. For example, our study has only 92 sample points; in the study of Aptoula, E. et al., the number of sample points reached 320 and their R 2 was higher than ours [57]. Although we increased the sample points via data expansion, more measured sample points could lead to higher accuracy [70]. Limited by the current sensor development, there are fewer satellites with both high spatial resolution and good spectral characteristics [71]. Therefore, in order to obtain better accuracy for water quality inversion, fusion experiments with hyperspectral satellites and UAVs can be further investigated [58,72]. Since many scholars have improved the fusion algorithm, further research can be conducted using better STF methods, such as EFSDAF [73].

Conclusions
In this study, we developed and verified an inversion model of Chl-a concentration based on CNN with fused high spatial and temporal resolution images, and analyzed the variation of Chl-a concentrations in Baiyangdian from July 2019 to September 2019. We found that CNN can reflect the spatial information of spatio-temporal fused images more effectively. In order to supplement the time series, a data conversion method is proposed to convert the Landsat images and supplement the Sentinel-2 images. To overcome the distortion of the fusion result, a pixel correction method based on image element decomposition was proposed to correct the inversion results of GF2-like images. The measured Chl-a and retrieved Chl-a had a high correlation (R 2 = 0.803), and the correlation R 2 increased to 0.879 after pixel correction. The Chl-a concentrations in BYD lake decreased and then increased from July to August and reached the maximum value on 31 August. This study shows that the proposed inversion model can reflect the spatial and temporal distribution of water quality parameters more accurately, and this method can also be applied to any area with sufficient water quality sampling point data.  Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Some data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request.