Feasibility of the Spatiotemporal Fusion Model in Monitoring Ebinur Lake’s Suspended Particulate Matter under the Missing-Data Scenario

: High-frequency monitoring of suspended particulate matter (SPM) concentration can improve water resource management. Missing high-resolution satellite images could hamper remote-sensing SPM monitoring. This study resolved the problem by applying spatiotemporal fusion technology to obtain high spatial resolution and dense time-series data to ﬁll image-data gaps. Three data sources (MODIS, Landsat 8, and Sentinel 2) and two spatiotemporal fusion methods (the enhanced spatial and temporal adaptive reﬂectance fusion model (ESTARFM) and the ﬂexible spatiotemporal data fusion (FSDAF)) were used to reconstruct missing satellite images. We compared their fusion accuracy and veriﬁed the consistency of fusion images between data sources. For the fusion images, we used random forest (RF) and XGBoost as inversion methods and set “fusion ﬁrst” and “inversion ﬁrst” strategies to test the method’s feasibility in Ebinur Lake, Xinjiang, arid northwestern China. Our results showed that (1) the blue, green, red, and NIR bands of ESTARFM fusion image were better than FSDAF, with a good consistency (R 2 ≥ 0.54) between the fused Landsat 8, Sentinel 2 images, and their original images; (2) the original image and fusion image offered RF inversion effect better than XGBoost. The inversion accuracy based on Landsat 8 and Sentinel 2 were R 2 0.67 and 0.73, respectively. The correlation of SPM distribution maps of the two data sources attained a good consistency of R 2 0.51; (3) in retrieving SPM from fused images, the “fusion ﬁrst” strategy had better accuracy. The optimal combination was ESTARFM (Landsat 8)_RF and ESTARFM (Sentinel 2)_RF, consistent with original SPM maps (R 2 = 0.38, 0.41, respectively). Overall, the spatiotemporal fusion model provided effective SPM monitoring under the image-absence scenario, with good consistency in the inversion of SPM. The ﬁndings provided the research basis for long-term and high-frequency remote-sensing SPM monitoring and high-precision smart water resource management. Contributions: Conceptualization, methodology, software, P.D.; validation, analysis, investigation, C.L. and P.D.; resources, F.Z.; data curation, C.L. P.D.; writing—original writing—review and C.L.,


Introduction
Water is very important in ensuring the sustainable development of human society [1]. However, human activities and climate change have threatened the global water qual-ity [2]. Research on the continuous monitoring of water quality has attracted considerable attention [3,4]. Suspended particulate matter (SPM) is one of the important attributes to determine water clarity by controlling light penetration through the water column [5]. It plays an important role in regulating water productivity and aquatic ecosystem functions [6]. In addition, SPM serves as a key carrier of carbon, oxygen, nutrients, and heavy metals [7,8], and the main raw material to continuously change the in situ environment and exert pressure on the ecosystem [9,10]. Therefore, the dynamic monitoring of SPM constitutes a critical domain of the intelligent management package for water resources [11].
Conventionally, SPM has been surveyed in situ [12,13]. However, manual or automatic sampling in a field-monitoring network of point measurements is costly, laborious, and time consuming. Moreover, the traditional sampling scheme cannot yield sufficient and reliable data to analyze dynamic water-quality parameters such as SPM, with a wide range of spatiotemporal variabilities [14]. Remote sensing offers an alternative that can capture sequential data, which have been rigorously pre-processed. It has the advantages of a wide temporal range of monitoring, making it possible to analyze the spatial and temporal variabilities of SPM concentrations in the surface layer of water bodies [15,16]. Many studies have applied optical remote sensing to estimate various water quality parameters such as SPM, turbidity, total nitrogen/total phosphorus ratio (TN/TP) [17][18][19][20]. However, only a few studies have explored the applicability of the spatiotemporal fusion algorithm in high-frequency SPM monitoring [21]. As a result, the retrieval of SPM concentration data using the spatiotemporal fusion of multi-source remote-sensing data is extremely rare [22].
The recent rapid development of remote-sensing technology has played an increasingly important role in environmental-ecological studies concerning dynamic ecosystem changes [23,24]. Due to the limitations of the technology, most sensors of orbit satellites often cannot acquire a sufficient number of high spatiotemporal resolution images in a given period [25]. This data inadequacy has notably restricted the application of remotesensing data in various fields. To resolve such problems, spatiotemporal fusion technology can effectively obtain high spatial resolution and continuous time-series data [26]. In addition to the fusion technology, scholars also intended to improve the spatiotemporal resolution of remote-sensing images in other perspectives such as improving the quality of the images [27,28], integrating multi-source satellites data [22], and developing CubeSat constellations [29]. The improvement of remote-sensing image quality can only play a role when the original image quality is poor. Therefore, the improvement of spatiotemporal resolution by this method is limited. The multi-source data integration method is limited from different sensor parameters; thus, the spatiotemporal resolution improvement solely depends on the characteristics of the combined sensors. Although CubeSat constellations fill the gap of the high spatiotemporal resolution, its global land surface imaging coverage capability was only available recently, including about 130 orbiting satellites [30]. In fact, it cannot obtain long historical time series images [31]. Therefore, spatiotemporal fusion technology can improve and fill the gap of long-time historical images [32].
Based on the principles of data fusion, Guo et al. [24] divided the spatiotemporal fusion models into five categories: learning-based, weight-function-based, Bayesian-based, hybrid, and unmixing-based methods. Their working principles can be found in the literature [33]. Although many spatiotemporal fusion models have been established to improve the spatiotemporal resolution of remote-sensing images, they differ notably in the data source, calculation efficiency, prediction accuracy, and data input requirement [34]; hence, it is difficult to determine their relative advantages. It is necessary to evaluate model applicability to specific study areas and data sources. Here, we chose ESTARFM and FSDAF methods to reconstruct missing images. ESTARFM is an enhancement model based on STARFM, which fully considers the spatial heterogeneity of pixels for obtaining better fusion results [35]. Meanwhile, FSDAF can obtain robust fusion results by combining un-mixing, spatial interpolation, and similar neighboring pixel smoothing techniques [36]. Therefore, taking MODIS, Landsat 8, and Sentinel 2 as data sources, this study explored the spatiotemporal fusion effect and consistency of the ESTARFM and FSDAF models and then analyzed the feasibility of retrieving SPM under the missing-image scenario.
The establishment of the inversion model is the key process to realize SPM monitoring in waters. At present, remote-sensing SPM inversion models can be summarized into three types: empirical method, analytical method, and semi-empirical/semi-analytical method [37]. The empirical method obtains the water quality information by establishing the statistical relationship between the measured and remote-sensing data. Due to the lack of a physical basis, this method's inversion accuracy and universality are judged to be unsatisfactory [38,39]. The analytical method uses the bio-optical and radiative transfer models to simulate the light transmission process in the atmosphere and water body and then establishes the relationship between water quality parameters and the water-leaving reflectance spectrum [40,41]. This method has a high inversion accuracy and generality in theory, but its observation equipment is complex and expensive, thus limiting its practical application [42]. The semi-empirical/semi-analytical method combines the known spectral characteristics of water quality parameters with a statistical model. It selects the best band or band combination as the independent variable to estimate the water quality parameters [43,44]. This method only needs a small amount of spectral and measured data to establish the model. RF and XGBoost are classic representatives of semiempirical models with robust performance in various fields of application research [45][46][47]. Therefore, RF and XGBoost were selected to monitor SPM in this study. It is worth noting that this study uses the fused image as the data source to monitor the SPM concentration in Ebinur Lake, and similar studies were rarely reported.
Our study area was Ebinur Lake, which is the largest saltwater lake in Xinjiang. It is located in arid northwestern China, with important urban agglomeration on the northern slope of the Tianshan Mountain [48]. In recent decades, global climate change and the intensification of regional anthropogenic activities have profoundly changed the structure, path, and driving forces of watershed environmental evolution [49]. Due to frequent wind impacts, shallow water, and high salinity, the water body would often remain in a state of high turbidity accompanied by fast changes in the lake area. Frequent lake area changes would retain the salt in the dried-up lake basin and quickly form salinized soil. There is a positive correlation between the dry bottom area of the Ebinur Lake and the total dust days in the watershed [50,51]. Salt dust has seriously affected the ecological environment and human activities in the region's urban agglomerations with extended influence to the whole of northern China [52]. Therefore, it is necessary to carry out high-frequency remote-sensing monitoring of SPM in Ebinur Lake to provide a theoretical basis to improve water quality management and salt dust control.
Based on the spatiotemporal fusion model, this study aimed to explore the feasibility of monitoring SPM in Ebinur Lake under the missing-image situation using two retrieval strategies ("fusion first" and "inversion first"). We explored four research questions: (1) For the shallow saltwater Ebinur Lake in arid northwestern China, what is the fusion accuracy of ESTARFM and FSDAF? (2) What is the consistency of the fused images between MODIS-Landsat 8 (ML8) and MODIS-Sentinel 2 (MS2)? (3) What is the accuracy of Landsat 8 and Sentinel 2 data in monitoring SPM in Ebinur Lake? (4) With gaps in satellite images, which strategy is more suitable for the effective monitoring of SPM in Ebinur Lake? Are they consistent with the SPM images retrieved from the original Landsat 8 and Sentinel 2 images?

Study Area
Ebinur Lake is located in northwest Xinjiang in arid northwestern China ( Figure 1). The site has few water quality monitoring stations and poor data continuity, leading to data scarcity [53]. Mountains surround it to the west, south, and north. Only the Bortala and Jing rivers currently feed into it, and a large area of agricultural land spreads in the southwest. The average lake depth is merely 1.4-1.6 m, with a water density of about 1.079 g/cm 3 , pH 8.49, and mineralization 112.4 g/L. The high salt content of the Ebinur lake water does not permit vegetation growth in the water area, leading to a very low chlorophyll content (when the volume of experimental water is 100 mL, its content is lower than the detection limit of 0.04 mg/L (HJ 897-2017)). The lake area fluctuates greatly between the wet and dry seasons, and the turbidity stays at a high level (the average SPM concentration in 28 days is 874.88 mg/L), with traits of a typical shallow saltwater lake in Central Asia. The Ebinur Lake region has low rainfall and strong evapotranspiration. The northwestern part of the lake basin is the well-known strong wind area "Alashankou" in China, with about 164 windy days (greater than 17.0 m/s) in a year [54][55][56]. The wind disturbs the sediments at the lake's bottom, bringing the sediments to the exposed lake basin with the waves [54]. This is one of the important sources of salt dust. The frequent salt-dust weather has negatively impacted local power supply and distribution facilities and crop growth [57,58]. Previous research experience has shown that Ebinur Lake is often in a state of strong dynamic change [54]. Therefore, the high-frequency monitoring of its SPM is very necessary.

Data Source and Pre-Processing
Five fixed points were set up in Ebinur Lake to monitor variations of SPM concentration ( Figure 1c). The in situ data were collected in May, June, September, and October from 2011 to 2020, with a total of 102 samples. May and September were monitored every year. The sampling adhered to the Technical Specifications Requirements for Monitoring of Surface Water and Waste Water (HJ/T 91-2002). The SPM concentration was obtained by the gravimetric method (GB 11901-89). After screening, 35 remote-sensing images were used, including Landsat 7, Landsat 8, MODIS, and Sentinel 2. The absolute time differences between the sampling dates and the satellite images were within five days, with 77.14% within two days ( Figure 2). In remote-sensing applications, this time difference is acceptable [59][60][61]. MODIS data (MOD09GA: reflectance production data with a spatial resolution of 500 m) were downloaded from https://ladsweb.modaps.eosdis.nasa.gov/ (accessed on 2 July 2021), while Sentinel 2 and Landsat data were acquired from the US Geological Survey website at http://lglovis.usgs.goc/ (accessed on 2 July 2021). According to the principle that the field sampling time should be synchronous with the transit time of satellites, 11 Landsat images (4 Landsat 7 and 7 Landsat 8), 8 Sentinel 2 images, and 18 MODIS images were selected. The corresponding number of sampling points for Landsat, Sentinel 2, and MODIS were 49, 42, and 37, respectively. For the Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes (FLAASH), the atmospheric correction module in ENVI (Version 5.3, ITT Visual Information Solutions, Boulder, CO, USA) was used for radiometric calibration and atmospheric correction of the Landsat series images. SNAP (http://step.esa.int (accessed on 12 July 2020)) was used for atmospheric correction of Sentinel 2. On the other hand, MRT (https://lpdaac.usgs.gov/ (accessed on 10 June 2021)) was used to reproject MODIS data to the UTM-WGS84 geographic coordinate system. Both Sentinel 2 and MODIS data were resampled to 30 m and accurately georeferenced with Landsat 8. Finally, the normalized difference water index (NDWI) was used to mask the water body to obtain the reflectivity image of Ebinur Lake.

Sensitive Band Selection
In previous studies, whether adopting a single band or band combination, the bands related to the retrieval of SPM in water bodies were mainly blue, green, red, and NIR [21,38]. In this study, the coordinates of the sampled points were spatialized and overlaid with Landsat 7/8, MODIS, and Sentinel-2 images. ArcGIS 10.3 was used to extract the reflectance values of the sampling points on each band of the three sets of satellite data. Then, Pearson correlation analysis was performed between reflectivity and SPM concentration in each band by SPSS 20 (IBM Corporation, Chicago, USA). Finally, the band sensitive to SPM was selected according to the significance level. After analysis, the blue, green, red, and NIR were selected as the sensitive bands for SPM inversion in this study ( Figure 3).

Spatiotemporal Fusion Model
One of the necessary conditions for the spatiotemporal fusion of images is the existence of overlapping areas in the wavelength bands of satellite data. The central wavelength of a band of one sensor should lie within a certain wavelength range of another sensor [34]. Figure 4 shows the band correspondence of the data sources used in this study. This theoretical basis for spatiotemporal fusion was satisfied with a large overlap between all four data source bands. This paper investigated the feasibility of applying spatiotemporal fusion models to lake SPM monitoring beset by image-data gaps. Two inversion strategies for SPM were set to tackle the problem-namely, "fusion first" and "inversion first". The "fusion first" strategy enlisted the spatiotemporal fusion model to obtain the substitution remote-sensing image at an unknown time. It then used the image to invert the SPM at that time. The "inversion first" strategy employed the remote-sensing image at a given time to retrieve the SPM at that time. A spatiotemporal fusion model fused the SPM maps at a known time to obtain the SPM at an unknown time. The spatiotemporal fusion models used in this paper were ESTARFM and FSDAF. The ESTARFM fusion method [35,62] was developed based on the basic framework of STARFM. It uses the known fine and coarse images at two times (T , T N ) as the data source to calculate the weight and conversion coefficient of similar pixels. Finally, it uses the coarse image at the predicted time (T) as the input to calculate the fine image at T time. The final prediction of ESTARFM is shown in Formula (1).
where L b and M b represent the reflectance of fine and coarse images in band b, w represents the moving window size, (x w /2 , y w /2 ) represents the position of the analog pixel, W i is the weight of the ith pixel similar to the predicted pixel, v i is the conversion coefficient between predicted pixel and its similar pixels, n represents the number of pixels similar to the predicted pixels, and (x i , y i ) is the position of the ith pixel similar to the predicted pixel.
The FSDAF fusion method [36] takes the fine and coarse image pairs at a known time as the data source to calculate the residual, weight function, etc. Finally, the fine image at the predicted time is obtained by taking the coarse image at the predicted time as the input. The final prediction of FSDAF is shown in Formula (2).
where n s is the number of similar pixels in the moving window, and w i is the weight of the ith similar pixel. ∆F t stands for a temporal change of a fine pixel, n stands for the number of fine pixels inside a coarse pixel, R(x i , y i ) stands for the residual in the coarse pixel at location (x i , y i ), and w h is a weighted function.
In this study, the operation of the spatiotemporal fusion model was realized by IDL language. The original Landsat 8 and Sentinel 2 images on 29 August 2017 were used to test the accuracy of the spatiotemporal fusion model. The data source dates of the two spatiotemporal fusion models are listed in Table 1.

Inversion Model for SPM
SPM is one of the most important water parameters, which profoundly affects the habitat quality of water bodies [20]. Extensive research has been accomplished to bring fruitful results [20,38,63]. To obtain the SPM concentration in Ebinur Lake efficiently, two algorithms (bagging and boosting) were adopted in this study to implement the inversion successfully. We chose two representative models: random forest (RF) and XGBoost; their theoretical basis is explained in Liaw and Wiener [64] and Lu et al. [65]. Random forest (RF) is an ensemble learning algorithm introduced by Breiman [66] to improve the traditional decision tree method. RF is a bagging algorithm [67] for classification or regression problems [68]. The main calculation process includes (1) random selection of samples via bootstrap; (2) random selection of features; (3) constructing decision tree; (4) prediction (voting or average). XGBoost is a boosting algorithm [69], originally introduced by Chen and Guestrin [70]. XGBoost is better than gradient boosting decision tree (GBDT) because it avoids overfitting by integrating regularization terms in the modeling process. XGBoost can perform three gradient enhancements, e.g., regularization, random, and gradient. Therefore, it can significantly improve the effect of classification or regression problems [71]. The inversion was conducted based on a single sensitive band to explore the potential of spatiotemporal fused images and two inversion models in monitoring SPM.

Performance Evaluation
To quantify the differences between the spatiotemporal fused and the real images, we used Python 3.0 to calculate the following evaluation indexes for accuracy evaluation. The coefficient of determination (R 2 ), the normalized root mean square error (NRMSE), the peak signal-to-noise ratio (PSNR), and the structural similarity index (SSIM) served as the evaluation indexes of image quality. Assuming that the real image I pixel ranks are m × n, the spatiotemporal fused image is F, I (i,j) is the real image pixel value, and F (i,j) is the spatiotemporal fused image pixel value. The R 2 was used to evaluate the consistency of pixel values between spatiotemporal fused and real images. The calculation process is The NRMSE is the RMSE value normalized to lie between (0,1) to evaluate the accuracy of different spatiotemporal fusion methods. The calculation process of RMSE is The PSNR evaluated the amount of information in the fused image [72]. It has a general benchmark of 30 dB, below which the image degradation is more pronounced. PSNR is often defined by the mean square error (MSE). Assuming two m × n monochrome images I and F, their MSE can be defined by Equation (5), and the corresponding PSNR is defined by Equation (6), where MAX (i,j) represents the maximum pixel value of the image.
The structural similarity (SSIM) is also an indicator of image quality to assess the degree of structural similarity between fused and real images [73]. The value range is (0,1), and a large value indicates smaller image distortion. The implementation process is where u I and u F represent the mean value of the original image and the fused image, respectively; σ I and σ F represent their standard deviation; σ I·F represents the covariance between the two images. C 1 and C 2 are two constants close to 0 to make the results stable. An SSIM value closer to 1 means that the two images have a similar structure.
To evaluate the performance of the SPM inversion model, we employed the Python statistical function library scipy.stats to conduct statistical analysis. Four indicators, coefficient of determination (R 2 ), root mean square error (RMSE), mean absolute error (MAE), and residual prediction deviation (RPD), were enlisted to evaluate the accuracy and stability of the model [38]. R 2 and RMSE are calculated by Equations (3) and (4), and MAE and RPD by Equations (8) and (9) as follows: where y i represents the observation value, y i is the mean value, SD is the standard deviation of the validation sample, and RMSE is the root mean square error. RPD > 2 mg/L indicates that the model has an excellent predictive ability. In the range 1.4-2 mg/L, the model can offer rough quantitative predictions. RPD < 1.4 mg/L means that the model is not predictive.

Workflow
Based on satellite (MODIS, Landsat 8, and Sentinel 2) and in situ data (SPM), this study used spatiotemporal fusion models ESTARFM and FSDAF to obtain Landsat 8 and Sentinel 2 fusion images and analyzed their consistency. The RF and XGBoost inversion models were used to complete the mapping of SPM distribution in Ebinur Lake. The feasibility of spatiotemporal fusion models for monitoring lake SPM was then explored. The rather complex steps in the workflow of this study are depicted in Figure 5.

Performance of the Ebinur Lake SPM Inversion Models
In this paper, the Python scikit-image algorithm was used to call the RF and XGBoost packages to implement the inversion of SPM. The training and validation sets were divided by a 7:3 ratio, and the optimal inversion model was obtained after training. The specific parameters of the optimal model and the evaluation index of the verification set are shown in Table 2. Other parameters of the model adopted the default values of RF and XGBoost packages. The validation set accuracy of both the RF and XGBoost suspension inversion models could meet the prescribed requirements (R 2 ≥ 0.78), and the model stability was rated as high (RPD ≥ 2.13).

Performance Evaluation of the Two Sensors and Two Spatiotemporal Fusion Approaches
Two spatiotemporal fusion models, ESTARFM and FSDAF, were used to conduct spatiotemporal fusion studies on ML8 and MS2, respectively. We used NDWI and visual interpretation to obtain the vector boundary of Ebinur Lake. In ArcGIS 10.3 (ESRI Corporation, Redlands, USA), the model builder was used to construct the clipping model, and then the images with the same number of pixels were obtained according to the water vector boundary. Figure 6 shows the true color display (red, green, blue) of the original images and the fused images of Ebinur Lake.  Figure 6 shows that the two spatiotemporal fusion models can predict the target date image well, but the ESTARFM model performed better from a visual perspective.
To quantify the quality and spectral information retention of the fused images produced by the two spatiotemporal fusion models, we generated the correlation maps between the bands of the fused image and the corresponding bands of the original image. At the same time, R 2 , NRMSE, PSNR, and SSIM of the fused images and the original images were calculated to quantitatively evaluate the fusion effect, as shown in Figures 7 and 8.  Figure 7 shows the fusions of ML8 and MS2 of four different bands with the ESTARFM and FSDAF models. The best fusion was achieved in the red band and the worst in the NIR band. All coefficients of determination (R 2 ) values were above 0.66, except for the NIR band fused by the ML8-based FSDAF model. Furthermore, Figure 7 shows that the values of R 2 for each band of the MS2-based spatiotemporal fused images were greater than the ML8-based ones. In addition, the results also indicate that the ESTARFM model performed better than the FSDAF model in the image fusion over Ebinur Lake, which is consistent with the visualization analysis.
In addition to reflecting the spatiotemporal fusion accuracy, Table 3 raises deep reflections on three problems: (1) Which satellite data source has more advantages when spatiotemporal fusion is carried out? (2) Which spatiotemporal fusion model is better in producing the Ebinur lake water fusion images? (3) What is the fusion accuracy of each band? On the first question, most of the evaluation metrics show that MS2 are better than ML8 corresponded to four bands, where the MS2-based fused images had higher R 2 , PSNR, SSIM values, and smaller NRMSE values. Therefore, MS2 should be prioritized in the image fusion of Ebinur Lake. For the second question, the R 2 , PSNR, SSIM, and NRMSE values were better in the ESTARFM model, compared with the FSDAF model, showing that the ESTARFM method was more suitable for image fusion of the Ebinur Lake area. For the third question, the best fusion performance of ML8 and MS2 was in the green and red bands. This result is explained by the R 2 , NRMSE, and SSIM in the green and red bands, which were better than those in other bands.

Consistency Evaluation of Spatiotemporal Fusion Images
To evaluate the consistency of fusion images based on Landsat 8 and Sentinel 2, the R 2 , NRMSE, PSNR, and SSIM of two original images-ESTARFM fusion images and FSDAF fusion images-in each band were compared. The correlation scatter plots of each band are shown in Figure 8, and the relative trend of the four indicators is shown in Figure 9. Figure 8 shows that the R 2 ≥ 0.82 of the original images of Landsat 8 and Sentinel 2 were high in each band, indicating that the original images of the two data sources had high consistency. The R 2 for both ESTARFM-and FSDAF-fused images of ML8 and MS2 were greater than 0.54 in all bands, and the fusion accuracy was higher using ESTARFM than FSDAF. Other indicators for evaluating the consistency of Landsat 8 with Sentinel 2 and their fused images are given in Figure 9. The highest consistency was based on the original images, the second-highest on the ESTARFM-fused image, and the worst on the FSDAF-fused image. In summary, the ESTARFM-and FSDAF-based spatiotemporal fusion models fused well in the Ebinur Lake area, with strong consistency between Landsat 8, Sentinel 2, and their fused images across the bands.

Retrieval Effect of SPM from the Original Image
In this paper, two inversion algorithms (RF and XGBoost) were used to implement the 29 August 2017 Ebinur Lake SPM inversion, and then their inversion accuracy was evaluated. The retrieval results of SPM from Landsat 8 and Sentinel 2 images are shown in Figure 10, and their accuracy evaluation is shown in Figure 11 and Table 4.    Figure 11 indicates that higher SPM concentrations in Ebinur Lake on 29 August 2017 were in shallow shoreline waters, and the lower values were roughly distributed in the central portion of the lake. Using the RF algorithm for inversion, the distribution of SPM inversions based on Landsat 8 and Sentinel 2 was generally consistent. Using XGBoost highlighted a clear difference between them. Figure 11 shows that the RF algorithm was significantly better than XGBoost for the SPM inversion, with a high correlation (R ≥ 0.83). Table 4 indicates that the accuracy of the RF-based SPM inversion at R 2 ≥ 0.68 was significantly higher than the XGBoost-based inversion at R 2 ≤ 0.24. The accuracy of the SPM inversion based on Sentinel 2 was slightly higher than Landsat 8 using the RF algorithm, consistent with the R 2 scatter-plot results. Thus, the RF algorithms for Landsat 8 and Sentinel 2 could effectively monitor SPM in Ebinur Lake.

Consistency Evaluation of SPM Retrieval from Original Images
Due to the low accuracy of the XGBoost-based inversion, this study only used the RF algorithm as the inversion method to investigate the consistency of the SPM inversion concentration based on Landsat 8 and Sentinel 2. Using the Create Random Points tool of ArcGIS 10.3 (ESRI Corporation, Redlands, CA, USA), 1000 random points were selected on the RF-based SPM inversion maps of Landsat 8 and Sentinel 2 according to the principle of minimum allowable distance (30 m). Their SPM concentration was then extracted to explore their consistency in spatial distribution. The correlation (R) between the two was 0.71 with p < 0.01 ( Figure 12). Additional indicators evaluating the accuracy of the SPM inversion for both data sources were obtained by calling the Python scikit-image algorithm. The consistency results were good for both inversion maps (Table 5).

SPM Inversion of the Fused Images
To explore the feasibility of SPM inversion in water bodies under the missing-image condition, two strategies ("fusion first" and "inversion first") were used to monitor SPM in Ebinur Lake. To facilitate cross-validation, the SPM monitoring date was set to 29 August 2017. The inversion maps are shown in Figures 13 and 14.  Comparing Figures 10 and 13, the SPM distribution maps obtained based on ESTARFM (Landsat 8/Sentinel 2)-RF were more similar to the original image SPM inversion maps from the visual viewpoint. The performance of the other three inversion maps was difficult to distinguish. Comparing Figures 10 and 14, it was difficult to distinguish qualitatively the advantages and disadvantages of the four SPM distributions maps.
To quantitatively evaluate the effectiveness of the two strategies for SPM inversion at missing-image times, the Python scikit-image algorithm was called to evaluate their inversion accuracy. The evaluation indicators for the optimal combination are shown in Table 6. The meaning of ESTARFM (Sentinel 2)_RF in Table 6 is as follows: firstly, MODIS and Sentinel 2 were fused by ESTARFM spatiotemporal fusion model to obtain Sentinel 2 image of the target date, and then the fused image was input into the RF model to obtain SPM distribution map. The meaning of other combinations was similar to ESTARFM (Sentinel 2) RF. Comparing the optimal combination of the two SPM inversion strategies (Table 6), the accuracy of the "fusion first" was slightly higher than the "inversion first" strategy. ESTARFM (Sentinel 2)_RF under the "fusion first" strategy had the best inversion effect and some consistency with Sentinel 2 inversion map, with R 2 = 0.41, NRMSE = 0.72, SSIM = 0.43, and PSNR = 23.24. For the "fusion first" strategy, FSDAF (Sentinel 2)_RF had the best inversion effect, with R 2 = 0.32, NRMSE = 0.77, SSIM = 0.36, and PSNR = 21.68. The results show that the "fusion first" strategy was more suitable for SPM inversion under the missing-image condition at Ebinur Lake. Section 3.4 demonstrates that the Landsat 8 and Sentinel 2 images showed high consistency in the inversion of SPM in Ebinur Lake. Therefore, we only needed to test the consistency of different inversion strategies under the same data source. The performance of the optimal combination in Figure 5 verifies the feasibility to monitor SPM in the Ebinur Lake area using the fusion image. To further explore the consistency of the distribution trend of SPM retrieved from the fusion image and the original image, 1000 random points were extracted from the inversion map under the four optimal combination situations in Table 6, and correlation analysis was conducted with the inversion map of the original image ( Figure 15). The four combinations had consistency in SPM distribution with a minimum R at 0.48. This result showed that the fused images were consistent with the original images for SPM monitoring.

Discussion
Based on MODIS, Landsat 8, and Sentinel 2 data, this study focused on assessing the feasibility of SPM inversion in Ebinur Lake under the missing-image situation. Our results indicated that the original and fused images from Landsat 8 and Sentinel 2 had high fusion accuracy in the blue, green, red, and NIR bands. The overall accuracy of the ESTARFM spatiotemporal fusion model was higher than FSDAF, and the fused images from Landsat 8 and Sentinel 2 had a good consistency. The original images of Landsat 8 and Sentinel 2 had high accuracy and consistency for SPM inversion. For the fusion images, the inversion accuracy of the "fusion first" strategy was higher than its "inversion first" counterpart. The SPM concentration retrieved from the fusion images was consistent with the original images.
Our research verified that in the time series, the integrated application of Landsat 8 and Sentinel 2 original images was feasible to monitor SPM in Ebinur Lake at a high frequency. The application of the spatiotemporal fusion model could make up for the missing images and improve the time resolution of high spatial resolution images, thereby enabling higher frequency SPM monitoring. Lakes often undergo dynamic changes [74]. Higher frequency monitoring could yield detailed data on water quality changes to strengthen lake-water management, biomass estimation, and carbon cycling assessment [25]. However, due to technical limitations, the temporal resolution and spatial resolution of the same sensor may restrict each other [62]. At present, the CubeSat constellations seem to have a great advantage in solving the lack of high spatiotemporal resolution images issue [29]. However, the differences in radiation quality among CubeSat constellations' satellites, inconsistent radiation calibration across multiple platforms, and relatively low spectral resolution are all challenges for their wide application [75]. In addition, the fact that CubeSat is a new constellation also signifies its weakness in the quantitative study of historical targets. Therefore, the methodology and findings of this study have potential applications to cognate environmental monitoring studies.
Satellite data play an extremely important role in lake SPM inversion studies. The data sources mainly include PlanetScope, ObrView-2, EO-1 Hyperion, MERIS, IRS LISS III, MODIS, Landsat 8, and Sentinel 2 [76,77]. After correlation analysis, the visible and near-infrared bands of MODIS, Landsat 8, and Sentinel 2 data were selected as the sensitive bands for SPM inversion in Ebinur Lake. This result is similar to studies in other regions [78,79]. In the empirical and semi-empirical models for SPM inversion, the performance of RF is relatively stable. The accuracy of the SPM inversion in this study using RF on Landsat 8 and Sentinel 2 images reached R 2 0.68 and 0.73, respectively (Table 3), which is comparable to the accuracy of related studies [37,38]. The SPM inversion model used in this study is an empirical model that does not involve physical processes. Although physical models based on radiative transfer, e.g., WASI, are also widely used in the field of SPM inversion [20,80,81], they are not restricted to SPM monitoring only. In fact, the models also play a great role in evaluating water quality, bathymetry, substratum type, and the concentrations of the optically active constituents of the water column [82]. The achievements of physical models in the application field are fruitful [80,83]. Unfortunately, studies on the inversion of water quality parameters using physical models based on spatiotemporal fusion images are relatively limited. Therefore, a further study on testing the SPM inversion ability using the physical model based on the fused image is suggested.
In this paper, fusion studies of MODIS-Landsat 8 and MODIS-Sentinel 2 were carried out to evaluate the consistency of the fused images to achieve results consistent with Li et al. [22]. In addition to obtaining results that align with previous studies, this project undertook the following: (1) selecting two spatiotemporal fusion models, ESTARFM and FSDAF, and comparing their applicability in the Ebinur Lake area; (2) two strategies, "fusion first" and "inversion first", were employed to explore the feasibility of the fused images to invert SPM; (3) at the image level, the consistency between the original Landsat 8, Sentinel 2, and their fused images was examined; (4) the consistency between the original image and the fused image for the SPM inversion was explored. Our application of spatiotemporal fusion models could complement existing studies on water environment monitoring and achieved a higher monitoring frequency of water quality parameters. It is hoped that our findings can provide references for sustainable water development and management.
Although the spatiotemporal fusion model was developed for terrestrial targets [35,84], it has also been successfully applied in estuarine SPM monitoring [22]. This study explored a further expansion of its application. Ebinur Lake is shallow and brackish, typical of the inland arid and semi-arid zone. The lake is loaded permanently with a high concentration of suspended particles. Its optical characteristics differ from the deep-water and freshwater lakes. Figure 16 shows the 28-day change trend of SPM concentration in Ebinur Lake as inverted by MODIS. It shows that Ebinur Lake is in a high turbidity state, and the average SPM concentration in 28 days is 874.88 mg/L. To measure the persistence of the high turbidity state, this study used the segmented coefficient of variation (CV) to quantify the dynamic change characteristics of SPM in Ebinur Lake. The high turbidity state of Ebinur Lake is relatively stable (Figure 16). There is a positive correlation between SPM concentration and water reflectance [85]. The high turbidity state can enhance the reception signal of satellite sensors, so it is feasible to use the spatiotemporal fusion model to carry out the SPM inversion study in Ebinur Lake. Figure 16. The SPM variation trend in Ebinur Lake in 28 days and the distribution of the coefficient of variation (CV) in different periods (CV < 1 means low variability; CV ≥ 1 high variability [86]).
In this study, the time intervals between the data pairs of the spatiotemporal fusion models were not completely synchronized, which might lead to aberrations. In addition, the difference between the date of actual SPM measurements and the date of remote sensing data acquisition may introduce errors into the inversion model. To assess more accurately the performance of the spatiotemporal fusion model, we should select remote-sensing data at the same interval to carry out the fusion work and select remote-sensing images that are quasi-synchronous with the measured values to build the inversion model. The sensitive band involved in our SPM inversion is the single band. In the future, Landsat 8 and Sentinel 2 can be coupled to realize the inversion of the time series of the Ebinur Lake SPM by using single-band substitution, single-band fusion, three-band fusion, etc.

Conclusions
The main objective of this research was to explore the feasibility of monitoring SPM in Ebinur Lake by the spatiotemporal fusion model in the missing-image scenario. The key conclusions are as follows: 1.
For spatiotemporal fusion results, ESTARFM is more suitable for the research questions associated with the Ebinur Lake area than FSDAF; 2.
The original and fused images of Landsat 8 and Sentinel 2 have high consistency in the blue, green, red, and NIR bands. The SPM time-series monitoring of Ebinur Lake can be realized using these two kinds of data synthetically; 3.
For Landsat 8 and Sentinel 2 images, the RF inversion model has a higher retrieval accuracy than the XGBoost model, and the consistency of the two data sources is good; 4.
For the fusion images, the inversion accuracy of the "fusion first" strategy is higher, which indicates that the spatiotemporal fusion model is feasible for SPM monitoring in Ebinur Lake.
Our research results verify the potential application of the spatiotemporal fusion model in monitoring SPM in lakes and other water bodies beset by the problems of satellite images lacking temporal resolution. They provide a reference for monitoring lake water quality with high temporal and spatial resolution and a theoretical basis for the refined and high-precision smart management of the lake water environment.
Funding: This research was funded by the National Natural Science Foundation of China, Grant Number U2003205 and U1603241, the National Natural Science Foundation of China (Xinjiang Local Outstanding Young Talent Cultivation) (U1503302) and the Tianshan Talent Project (Phase III) of the Xinjiang Uygur Autonomous region. The APC was funded by Xinjiang University.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to confidentiality.