1. Introduction
Eutrophication is the excessive input of nutrients into a body of water, leading to the excessive growth of aquatic plants and algae. It is currently a global environmental problem, posing serious threats to both aquatic ecosystems and human health [
1]. The incidence of eutrophic lakes has been increasing worldwide since the 1960s [
1], with this increasing trend being most pronounced in developing regions such as Asia and Africa [
2]. By 2050, it is estimated that one-sixth and one-quarter of the world’s population will be at high risk of poor water quality due to excess nitrogen and phosphorus, respectively, making eutrophication one of the greatest threats to water quality [
2]. Chlorophyll-a (Chl-a), a photosynthetic pigment present in all algae, is an important indicator of the degree of eutrophication of water bodies and can accurately reflect the abundance and biomass of aquatic plankton [
3]. Therefore, timely and effective monitoring of Chl-a is of great significance for evaluating the water environment of Taihu Lake and for assisting in the management of water environments [
4].
Traditional Chl-a monitoring methods include field sampling followed by laboratory analysis and in situ monitoring. The former requires both obtaining samples manually in water bodies and processing and analyzing the samples in laboratories, which is not only inefficient but also costly and time-intensive [
5]. In situ monitoring automatically monitors Chl-a through preset underwater sensors, but it is easily affected by environmental factors such as the temperature and water composition. For example, waves caused by strong winds can cause the buoy of a monitoring station to shake, and water currents carrying suspended particles can dislodge sensors, leading to signal attenuation, and the performance of electronic components (such as photodiodes) fluctuates with temperature changes, among other factors, thereby reducing the monitoring accuracy [
6]. The development of remote sensing technology has provided a new approach for the large-scale and dynamic monitoring of Chl-a concentrations in water environments [
7]. It can frequently provide relatively accurate water quality information and is now widely used in the monitoring of Chl-a concentrations in lake water bodies. MODIS, MERIS, OLCI, and other main optical sensors perform well in monitoring Chl-a [
8,
9,
10,
11,
12], despite some limitations in temporal and spatial resolutions [
13].
In lake waters, the rapid proliferation of algae leads to a rapid increase in the Chl-a concentration. Under sustained specific environmental conditions (high temperatures, nutrient concentration, and stagnant water), the biomass of algae can increase by 2–3 times within 24 h [
14], causing cyanobacteria blooms. To monitor and warn of this phenomenon in a timely manner, the monitoring frequency of Chl-a must be daily or even hourly. In addition, small lakes and areas infested with cyanobacterial blooms have high requirements for the spatial resolution of the monitoring data. Lekki et al. [
15] pointed out that the minimum spatial resolution for accurately measuring changes in algae bloom concentrations in large lakes (area ≥ 500 km
2) using remote sensing technology is 50 m, with a 25 m or higher resolution being the most suitable for identifying areas that have significant impacts on human health and ecological damage. It is currently believed that satellites such as Landsat (30 m resolution) and Sentinel-2 MSI (10 m resolution) can better meet spatial scale requirements [
16]. However, most satellite data currently used for Chl-a monitoring do not meet the dual requirements for the monitoring time and spatial resolution. MODIS is widely regarded as a relatively accurate remote sensing data source [
17]. However, its spatial resolution of 1 km hinders Chl-a monitoring in inland lakes and complex coastal areas. As a mainstream water color satellite, VIIRS has a revisit time as short as 14 h and has improved its spatial accuracy to a certain extent [
18]. However, its minimum resolution of 750 m is still unsuitable for fine-grained monitoring. Landsat has great potential for estimating a variety of water quality parameters, with a spatial resolution of 30 m and good detection ability [
19]. However, its 16-day revisit period makes Chl-a monitoring difficult. Sentinel-2 MSI has a higher spatial resolution than MODIS, Landsat, and VIIRS, among other factors, and can accurately monitor the process of Chl-a [
20] development in algal blooms on a 5-day time scale. The geostationary ocean color imager (GOCI), launched in 2010 by the Korea Ocean Satellite Center (KOSC), has a temporal resolution of 1 h, which meets the need for monitoring dynamic changes in Chl-a [
21]. However, its spatial resolution is coarse and is unable to satisfy the requirements for fine-grained monitoring.
To improve the data’s accuracy and meet the need for high-resolution monitoring, it is, therefore, necessary to use downscaling methods for data fusion. Downscaling is the process of converting low-resolution data into high-resolution data that can be applied to both spatial and temporal dimensions. Spatial downscaling methods are relatively common in remote sensing research. Wang et al. [
22] proposed a spatial downscaling method for the GPM IMERG precipitation dataset, optimizing precipitation data with a spatial resolution of 11 × 11 km to 1 × 1 km; Chu et al. [
23] used a random forest model to improve the spatial resolution of GRACE data from 300–400 km to 0.25 × 0.25 km based on three spatial downscaling ideas: grid point, watershed, and national scale. While retaining the characteristics of GRACE data, they obtained a refined water storage change situation. Nomura et al. [
24] constructed a downscaling model based on a convolutional neural network (CNN) and used SAR data to increase the spatial resolution of MODIS NDVI products from 250 m to 10 m. Zhang et al. [
25] proposed a Chl-a downscaling model based on multivariate analysis (MVA) and gradient boosting decision tree (GBDT), which increased the spatial resolution of Sentinel-3 Chl-a data from 300 m to 30 m, making it particularly suitable for inland lakes in different environments. Relevant studies have also been conducted on temporal downscaling. He et al. [
26] proposed the SRF-MF method based on the random forest (RF) model, which uses satellite products to fuse station data and generate high-quality daily precipitation data. Adrien et al. [
27] proposed an enhanced Delta method to improve the climate change index from a daily resolution to hourly resolution and verified the versatility of this method within a certain range.
In contrast, temporal downscaling is mainly applied in the fields of meteorology and climate. For water environment monitoring, especially for downscaling water quality parameters such as Chl-a, research is still limited. And the short-term analysis of remote sensing data still faces challenges in terms of accuracy and frequency. Although there are some daily or hourly satellite products, most studies have focused only on annual or monthly products, such as soil moisture and precipitation. These parameters change relatively slowly and have low temporal resolution requirements. This may be one of the reasons for the limited attention given to temporal downscaling in the current remote sensing field. Simultaneously, owing to the randomness, mutability, and complexity of Chl-a concentration changes, higher requirements are placed on its accuracy and frequency in water quality monitoring. Therefore, high-precision temporal downscaling research on water quality parameters such as Chl-a holds practical significance and application value.
To address these issues, this study proposed a precorrection-based spatiotemporal downscaling method (PC-STDM). This approach adaptively corrects the Chl-a concentration inversion data obtained from the Sentinel-2 MSI and the COMS-1 GOCI, and the Sentinel-2 MSI inversion data were temporally downscaled according to the calibrated COMS-1 GOCI hourly Chl-a concentration change trend to generate Chl-a concentration data with both a 10 m spatial resolution and a 1 h temporal resolution. This process achieves an effective fusion of Chl-a data at different scales, overcomes the resolution limitations of traditional remote sensing data, and can more accurately reflect the spatiotemporal variation characteristics of Chl-a in lake waters, providing reliable data support for water quality monitoring, ecological environmental assessment, and the construction of an early warning system in the Taihu Lake area.
4. Result
4.1. Correction of Chlorophyll-a Inversion Data
Sentinel-2 MSI and COMS-1 GOCI images of the Taihu Lake area (2019–2024) were screened for visual quality; images with no obvious cloud cover and high-value areas of the inversion results coinciding with the algal bloom areas of the true-color images at the same time were selected to construct a dataset. A total of 92 Sentinel-2 MSI images and 627 COMS-1 GOCI images were selected.
Due to differences in data sources, preprocessing methods, and inversion models, the spatial distribution of the Chl-a concentration data inverted using COMS-1 GOCI and Sentinel-2 MSI exhibited seasonal discrepancies (
Figure 3). In spring (
Figure 3a) and winter (
Figure 3d), the high-value areas of Chl-a were mainly concentrated on the west and south sides of the lake. According to the inversion values of Sentinel-2 MSI and the measured values, the highest concentrations of Chl-a in the high-value areas can reach approximately 30 μg/L and 45 μg/L during spring and winter, respectively, and those in the low-value areas are relatively low, in the range of 0–10 μg/L in both two seasons. The spatial patterns of the COMS-1 GOCI data before correction were roughly correlated with those of Sentinel-2 MSI, which reflected the spatial heterogeneity of Chl-a, but the inversion values were generally high, reaching 550 μg/L at the highest concentration and 17 μg/L at the lowest concentration. The maximum deviation reached 18 times the inversion value of Sentinel-2 under the same conditions. In summer (
Figure 3b), Chl-a exhibited proliferation along western zones and spread to the center of the lake, with a maximum concentration of 50–70 μg/L. However, the COMS-1 GOCI Chl-a data before correction did not adequately capture the scope of the large-scale explosion. Instead, it showed extremely high values (approximately 550 μg/L) sporadically on the northwest coast. The overall spatial distribution was poor. The corrected COMS-1 GOCI inversion data (
Figure 3(b3)) effectively restored the spatial characteristics of this large-scale explosion. In autumn (
Figure 3c), there were occasional high-value areas of Chl-a on the northwest and north coasts, with the highest value in the range of 40–50 μg/L and the average value in the range of 5–10 μg/L. The COMS-1 GOCI data before correction covered almost the entire lake in the high-value area, with the highest reaching 760 μg/L, and even negative values in the low-value area. The corrected inversion data (
Figure 3(c3)) solved the problems of disordered spatial distributions and negative values.
The above information demonstrates that there are serious inconsistencies between the inversion data from COMS-1 GOCI and Sentinel-2 MSI. This inconsistency was reflected in the large deviation of values in winter and spring, and in the summer and autumn, when algal blooms were more serious; it was reflected in the double deviation of spatial distribution and values, which was more serious. Therefore, before the actual downscaling process, this study first corrected the inversion data. While correcting the Chl-a values of COMS-1 GOCI to the inversion data of Sentinel-2 MSI, the daily variation characteristics of Chl-a were retained, providing better data preparation for subsequent downscaling work.
After correction, the COMS-1 GOCI inversion data were validated against Sentinel-2 MSI data at the same time, and the scatter plot is shown in
Figure 4. It can be seen that the pre-corrected COMS-1 GOCI and Sentinel-2 MSI inversion results in different seasons showing large deviations, with the most severe deviations in summer. After model correction, the consistency of the data was significantly improved, the coefficient of determination (R
2) increased to above 0.9, and the correction effect was obvious. The verification results showed that the highest value of R
2 reached 0.93, with an average value of 0.87, indicating that the machine learning model correction used in this study effectively reduced system bias and improved the reliability of the data.
4.2. Method Performance Comparison
When temporally downscaling the Chl-a inversion results of Sentinel-2, in order to adapt to the differences in chlorophyll changes under different environmental characteristics, this study used two methods: regression trend assessment downscaling (RTAD) and Time Weight Downscaling (TWD), to temporally downscale the Chl-a inversion data. In order to specifically compare the model performance and adaptability to the environment, this study conducted the following analysis based on the model principle.
The RTAD method used the GOCI daily variation data to fit the trend equation, which was later used to extrapolate the Chl-a concentration data from Sentinel-2 at different times of the day. However, this downscaling result based on the temporal trend assessment method exhibits strong functional dependence on the constructed trend equation. In order to judge the overall performance of the RTAD method, the accuracy of the trend equation in different seasons is quantitatively statistically analyzed, and the results are shown in
Figure 5.
As can be seen in the figure, the overall R2 of the daily trend fitting equation of the RTAD method is relatively high, which exceeded 0.9 in over 50% of cases. Among the four seasons, autumn demonstrates the best fit to the overall trend, likely attributed to its relatively stable meteorological conditions and gentle hydrological variations. These favorable environmental factors promote Chl-a accumulation, resulting in a more concentrated and stable distribution of Chl-a concentrations that facilitates model fitting. In contrast, the method exhibits its lowest R² values in winter, possibly because the lower Chl-a values and less pronounced seasonal variations during this period reduce the model’s sensitivity to parameter changes.
In addition, in order to evaluate the significance of the regression model fitting of the RTAD method, the significance analysis was conducted. The t-test assessed linear/quadratic fits, while the Mann–Kendall test evaluated Theil–Sen regression. The statistically significant fitting part (p < 0.05) accounted for 57.24%, of which the p < 0.01 part accounted for 36.61%, which was within the acceptable range.
The TWD method uses the ratio of the benchmark time data to the target time data as the time weight to extrapolate the downscaled data. While independent of current remaining time data, this approach suffers from strong dependence on the quality of the benchmark time data. It can be observed in
Figure 6 that the downscaled data restore the spatial characteristics of the Sentinel-2 inversion results and the time trend of the GOCI inversion results well, but some irregular noise artifacts are generated in the image. This is because the TWD method relies on the weight value of the Chl-a data of the benchmark time and the target time, which makes the data quality of the downscaled results have a greater correlation with the quality of the benchmark data. If there is noise in the GOCI data at the benchmark time point, the downscaling results of the TWD method inherit or even amplify the characteristics of this noise, leading to instability in output quality.
4.3. Accuracy Verification
This study selected the measured data of hydrological monitoring sections at four different locations in Taihu Lake from 2020 to 2023 to validate the downscaling results. The reliability of the downscaling method was confirmed by comparing the daily variation trend of the downscaling results with the measured data.
Figure 7 shows the verification results for sections in different seasons and locations.
As shown in
Figure 7, the daily variation trend of the Chl-a concentration obtained using the two downscaling methods was consistent with the measured trend. Among all samples, the fitting coefficient r reached 0.98 and 0.96 at the highest, and it performed well at monitoring sections at different locations, indicating that the downscaling method proposed in the study can provide a significant optimization of the temporal resolution of Sentinel-2 MSI Chl-a data while maintaining the spatial characteristics. However, environmental factors such as weather conditions and hydrological characteristics in different seasons affected the daily variation trend and the degree of fitting to varying degrees, potentially degrading fit quality.
In order to display the verification results more intuitiveely, the fitting coefficient r was graded to show the degree of fitting of the trend: for r > 0, the trend is considered correctly fitted, and the proportion of correctly fitted days to the total number of days is the overall fitting rate; for r > 0.5, the trend is considered well fitted, and for r > 0.8, the trend is considered highly fitted; the proportion of days with each degree of fitting to the total number of fitting days is the fitting rate of each degree.
As can be seen in
Table 3, both RTAD and TWD downscaling methods can effectively capture the daily variation trend of Chl-a with good overall results, and the maximum fitting degree reached 0.98 and 0.96, respectively, reflecting the credibility of the PC-STDM model. Among them, the overall fitting rate of the RTAD method was lower than that of the TWD method; however, in the correctly fitted part, the fitting degree of the RTAD method was higher, with a good fitting rate of 16 percentage points higher than that of the TWD method, and the high fitting rate was twice that of the TWD method. Although the maximum fitting coefficients of the two methods were above 0.95, both methods still exhibited uncertainty during the frequent occurrence of cyanobacterial blooms in lakes.
4.4. Case Study
According to an article published by the Jiangsu Provincial Environmental Monitoring Center, on 26 February 2022, a large area of yellow cyanobacteria blooms appeared along the Meiliang Bay of Taihu Lake and its surrounding rivers, attracting great attention from the environmental department and the public. The local ecosystem has significant impact on residents’ lives [
43]. In order to further validate the practicality of the research results, this study targeted the above-mentioned algal bloom event and selected Chl-a downscaling results on 27 February 2022, for verification and analysis. The results are shown in
Figure 8 and
Figure 9.
Figure 8 demonstrates that the Chl-a concentration on that day showed an overall trend of first increasing and then decreasing. Compared with the measured data at Tuoshan Station in Northwest Taihu Lake, the daily variation trend of Chl-a obtained using the RTAD and TWD downscaling methods was consistent with the measured data. The fitting coefficients of the trends reached 0.87 and 0.94, respectively, indicating that the downscaling results on that day were consistent with the actual situation.
As can be seen in
Figure 9, during the period of 9:00–16:00 local time on February 27, the downscaled data successfully captured the spatial distribution characteristics and temporal variation trends of Chl-a, which was consistent with the measured results.
In terms of spatial distribution characteristics, the original COMS-1 GOCI inversion data were consistent with the Chl-a data obtained using the two downscaling methods. The main high-value areas were concentrated along the southwestern coast of Meiliang Bay and the southern lakeshore area. However, compared with the original COMS-1 GOCI inversion data, the downscaling results showed more spatial distribution details of Chl-a. For example, a regional Chl-a concentration of 38.44 μg/L was detected on the northwest coast, whereas the data before downscaling failed to observe this high-value area. The capture of this detail further confirms the importance of the high spatial resolution in algal bloom monitoring.
In terms of the time change trends, the downscaling results were highly consistent with the original COMS-1 GOCI inversion data. Taking the northwest region with obvious changes as an example, the average high value of Chl-a concentration that day increased from 33.36 μg/L at 9:00 to 37.04 μg/L at 13:00 and then gradually declined to 27.64 μg/L at 16:00 after reaching the peak; the overall trend exhibited a slight increase at first and then a sharp decrease. This trend was successfully captured in the hourly scale results after downscaling and was inferred to be the reduction period after the algal bloom outbreak.
In summary, the downscaled Chl-a concentration retains the daily variation trend of the original GOCI inversion data, shows more spatial details, and significantly improves the accuracy of the data. Through the tracking of this algal bloom event, it is once again confirmed that the downscaling work of this study is feasible and effective in practical applications. High-frequency and high-precision Chl-a monitoring has application value in supporting emergency response decisions, pollutant tracking and tracing, and quantitative evaluation of ecological restoration. It also reveals the important practical significance of high-frequency Chl-a monitoring in water environment management and cyanobacteria bloom early warning.
4.5. Temporal and Spatial Distribution Characteristics of Chlorophyll-a Concentration in Taihu Lake
Based on the downscaling results, the revealed spatiotemporal patterns of the Chl-a concentration in Taihu Lake were obtained. In terms of the time scale, the variation characteristics at three levels, namely, interannual, seasonal, and daily scales, were shown. Among them, interannual variation was derived from annual averages (2019–2023), seasonal variation in the triennial averages of different seasons (2019–2023), and daily scale variation from Chl-a concentration data within 7 h from 9:00 to 16:00 on a day when algal blooms occurred and the daily trend was obvious.
As can be seen in
Figure 10, the high-value areas of Chl-a in Taihu Lake are mainly distributed in the nearshore and estuaries on the west and north sides, mainly including the Taige Canal on the north bank, Changxing Port on the west bank, and Dongshaoxi and Xishaoxi on the southwest bank, whereas the low-value areas are mainly distributed in the open waters in the center of the lake and the eastern region. The multi-year average Chl-a concentration exhibited a persistent northwest–southeast gradient, which was high in the northwest and low in the southeast. Over the past five years, the spatial distribution range and concentration level of Chl-a generally increased and then decreased. The annual average Chl-a concentration was 3.76 µg/L in 2019 and rose to 3.87 µg/L in 2020. It began to decline thereafter and reached 1.85 µg/L in 2023, which was only half of that in 2019. Simultaneously, the high-value area in the north was suppressed well. This interannual variation trend was mainly related to the water resource management of the relevant departments of Taihu Lake.
As shown in
Figure 11, Chl-a showed significant changes in the different seasons. The overall high-value range in spring was small and the concentration level was low, mainly distributed on the northwest coast, with an average concentration of 2.59 μg/L; summer was the season with the most severe Chl-a outbreaks, in Zhushan Lake, Meiliang Bay, and Gonghu Bay in the north, which were key outbreak areas. In severe cases, it covered the entire lake, and the average concentration over many years can reach 4.91 μg/L. In autumn, the Chl-a concentration declined, but there were still high-value areas in the Dapukou area in the west, with the average concentration reaching 3.57 μg/L; in winter, it is the season with the lowest Chl-a concentration level. There was no obvious high-value area, and the average concentration was 1.65 μg/L.
Seasonal differences were mainly related to light and temperature. In summer, rising water temperatures and long sunshine hours can accelerate the photosynthesis of phytoplankton, promote an increase in the number of algae, and maintain a high concentration of Chl-a. At the same time, summer was usually accompanied by an increase in rainfall and runoff, which brought a large amount of N, P, and other substances into the lake, providing a rich source of nutrients for algae [
44,
45]. In winter, the temperature was lower and the water temperature dropped, which significantly inhibited photosynthesis and the growth of phytoplankton [
46], leading to a decrease in the Chl a concentration.
On a daily scale (
Figure 12), the Chl-a concentration changed regularly and dynamically at different times. Within a single day, the spatial distribution pattern of Chl-a was basically stable, and the changes were mainly manifested in that the concentration in the high-value area in the morning (9:00–10:00) was low, within the range of 5–15 μg/L, and at the same time, it diffused over time with 1–2 high-value points as the source. At noon (13:00–14:00), owing to the increase in temperature and enhanced light, Chl-a reached its maximum value in both range and concentration, and sometimes, multiple high-value points appeared in the northern, eastern, and western coastal areas, with concentrations reaching more than 30 μg/L; in the afternoon (15:00–16:00), the concentration of Chl-a gradually decreased and returned to normal levels, and the range of the high-value area shrank toward the coast. However, unlike the relatively stable patterns between years and seasons, the daily variation trend of Chl-a was affected not only by temperature but also by multiple factors such as nutrient levels on different dates, water mixing conditions, and human activities. It exhibited context-dependent variability. The purpose of this study’s temporal downscaling was to capture such diverse and complex changes.
Based on the above characteristics, it can be considered that the spatial distribution pattern of Chl-a in nearshore and estuary areas is mainly related to the discharge of coastal agricultural runoff, domestic sewage, and industrial wastewater, which is consistent with the conclusions of many previous studies [
47,
48,
49,
50]. The specific reason is that the large-scale discharge of sewage has led to a significant increase in the content of nutrients such as nitrogen and phosphorus in the lake water, providing conditions for the reproduction of phytoplankton. At the same time, differences in hydrodynamic conditions made it easier for nutrient salt to accumulate in estuary areas, leading to an increase in Chl-a concentration in these areas [
51]. The temporal variation characteristics of the Chl-a concentration were the result of the joint action of water temperature, light, and other natural environmental factors. During periods of insufficient light and low temperatures, algal photosynthesis was weak. When the light intensity increased, the temperature also increased, prompting algae to reproduce and accumulate in large numbers, showing the characteristics of a concentrated distribution of high Chl-a concentration.
The above findings were consistent with the results of Xue et al. and Ma et al. [
51,
52] on lake algae, further confirming the advantages of high-precision remote sensing data in dynamically monitoring changes in lake water quality.
5. Discussion
5.1. Pre-Correction of Chlorophyll-a Inversion Data
It was necessary to correct inversion data before downscaling, which was reflected in the following aspects:
First, there were discrepancies across technical specifications such as resolution, sensor design, orbit characteristics, observation geometry, and processing methods between the two data sources, which caused objective systematic biases, as shown in
Table 2 above.
For data preprocessing, Yang et al. [
53] proposed that the atmospheric correction process of ocean satellites would fail for inland turbid water bodies when using COMS-1 GOCI data to invert the Chl-a concentration in Hangzhou Bay in 2020. Zhang et al. [
54] further noted that the opacity of high-concentration suspended matter (TSM) would interfere with the spectral measurement of Chl-a concentration and thus affect the inversion accuracy of Chl-a concentration when using COMS-1 GOCI data to invert Chl-a in the Yellow River estuary. Zeng et al. [
55] found that the 6S model would slightly over-correct
when processing COMS-1 GOCI data using COMS-1 GOCI
. In summary, our study may have introduced data errors into the atmospheric correction process of COMS-1 GOCI, which reduced the data accuracy compared with the official atmospheric correction process used in the Sentinel-2 L2A product.
In terms of inversion algorithms, the COMS-1 GOCI had fewer band settings, limiting the application of high-precision multiband models. In addition, owing to differences in water body components and their bio-optical properties, there was a large variability in the optimal band position of the inversion factor and the inversion model parameters, which greatly reduced the universality of the multiband Chl-a inversion model [
56]. The COMS-1 GOCI three-band inversion model used in this study was constructed based on a sample dataset collected from 17 inland lakes in China from 2013 to 2020 [
37], and the Sentinel-2 MSI inversion model was constructed based on an actual sampling dataset at 22 stations in the Taihu Lake area, according to the waterway and actual hydrological conditions [
57]. Compared to COMS-1 GOCI, the Sentinel-2 MSI inversion model was more suitable for Taihu Lake.
Satellite-derived versus in situ comparisons revealed that the inverted COMS-1 GOCI Chl-a concentrations were generally higher, whereas those of Sentinel-2 MSI were normal, which also confirms the above point of view. If this systematic bias were not corrected, it would significantly affect the accuracy of subsequent temporal downscaling results.
Secondly, from the downscaling method perspective, temporal downscaling requires comparable data from COMS-1 GOCI and Sentinel-2 MSI. Large differences between these datasets can introduce errors during data fusion, reducing the reliability of the final product. Thus, the correction step is crucial for ensuring result quality.
Judging from the final actual results, the correlation between the COMS-1 GOCI data corrected by the random forest model and the Sentinel-2 MSI data was greatly improved, with the determination coefficient R2 reaching a maximum of 0.93, while still maintaining the original spatiotemporal distribution characteristics, reflecting the necessity and correctness of the correction process.
This study adopted an important assumption when correcting the COMS-1 GOCI data: the time trend of the corrected COMS-1 GOCI data was still reliable. The rationality of this assumption was based on the following aspects:
Firstly, from a methodological point of view, this study used a machine learning model for correction, which mainly focused on the adjustment of numerical ranges rather than the reconstruction of time-series patterns. The training process of the model was based on feature mapping of the corresponding spatial points rather than the time series reconstruction, which ensured that the time-varying information contained in the original data was not significantly disturbed.
Secondly, the introduction of the time difference characteristics () in the correction process and the spatial smoothing of the Gaussian filtering method not only enable the model to retain the influence of the observation time on the inversion results but also indirectly enhance the stability of the time series, which helps to suppress abnormal fluctuations and make the assumption of time trend more reliable.
However, this assumption had several limitations. The random forest correction method itself did not focus on the autocorrelation of the time series and may ignore complex time dependencies. If multiple consecutive days of data were processed, the correlation between the different dates may be weakened. Although the non-parametric characteristics of the random forest model provided better nonlinear fitting capabilities, they may also overfit spatial features in some cases, thereby affecting the reliability of time trends.
In practical applications, especially in lakes with more complex water types (such as the coexistence of bays and open waters, many rivers entering the lake, and complex ecological composition), or under certain special meteorological conditions (such as severe convective weather), the daily variation pattern of Chl-a may be affected by many factors, and the actual temporal variation of Chl-a showed a large mutation. At this time, assumptions based on the temporal variation trend may not accurately reflect the actual change process.
5.2. Comparison of RTAD and TWD Downscaling Methods
In the RTAD method, since constructed trend equations are temporally continuous, extended data beyond baseline timepoints t can be obtained. However, the construction of the trend equation is limited by the time attribute of the basic data, and the downscaling results are heavily affected by the data in the time period before and after the target time point. The independence of the model is low. When there are abnormal changes in some interval values, it will have a certain impact on the result values of the remaining data, making it difficult to capture abnormal value changes, thereby affecting the overall data accuracy. In terms of the statistical data of many years (
Figure 5), when the Chl-a concentration distribution is close to a random trend as a whole, the regression trend fitting model is not able to obtain trend changes, which leads to a decrease in the overall fitting R
2.
The TWD method uses the ratio between the base time and target time data as the time weight to obtain trend changes and extrapolates Chl-a concentration data of Sentinel-2 MSI at different times according to the weight. Compared with the RTAD method, the TWD input data correspond to the time of the output result one by one, and data outside the basic data time point cannot be obtained. However, the model trend acquisition only depends on the relevant two-time data. The method has high independence and is not affected by the data of non-target time in the time period. It has a strong ability to capture abnormal value changes.
The downscaling results obtained using the two methods were verified for accuracy (
Table 2).
Table 2 indicates that the overall fitting rate of the TWD method was higher than that of the RTAD method. In the correctly fitted part, the RTAD method had a higher proportion of good and high fit, and the highest fitting coefficient was larger, reaching 0.98.
The difference between the two methods was related to the mechanism of each method. The TWD method had strong independence owing to the calculation of time weight at a single moment, and the overall effect was not easily disturbed by abnormal data. It performed better when downscaling data with complex trend changes; therefore, the overall fitting rate was higher. Although the RTAD method reduced the ability to capture details in the process of fitting daily changes by constructing a trend equation, it produced good results when processing data series with clear trends. It can be considered that the more regular the trend of the Chl-a change, the higher the accuracy of the downscaling results of the RTAD method. Therefore, in the fitting part, the RTAD method exhibited a higher degree of fitting.
In general, the results of the two downscaling methods performed well, and the data results were reasonable; however, some results were inconsistent with the site data. When analyzing the entire research process, this inconsistency may be caused by the accumulation of errors in multiple links:
First, this study referred to the inversion models developed by scholars based on different datasets and research periods, which cannot be fully applied to the data inversion of this study, especially in the high-turbidity water area of Taihu Lake. The accuracy of Chl-a inversion is affected by the scattering and absorption of total suspended matter (TSM), and colored dissolved organic matter (CDOM) also masks the characteristic spectrum of Chl-a through strong absorption in certain bands. Although this factor has been taken into account during the construction of the inversion model and the influence of TSM and CDOM has been reduced as much as possible, this interference cannot be completely eliminated. The R2 test of the inversion model is 0.82 and 0.78, which has a certain impact on the accuracy of the data. Secondly, the random forest correction model used in the study weakens the temporal continuity characteristics of the data to a certain extent based on the pixel processing method. The subsequent addition of Gaussian filtering may also blur local changes with ecological significance while reducing spatial noise.
The biggest reason for the inconsistency between the model and the actual trend was that both downscaling methods were based on simplified assumptions and did not consider the influence of environmental factors such as hydrodynamic processes, meteorology, and climate. Under certain circumstances, they may not be able to fully express the complexity of the dynamic changes in Chl-a. In current research, downscaling methods can be divided into two parts: dynamic downscaling and statistical downscaling. The dynamic downscaling model was complex, the data volume was large, and it was difficult to transfer. The statistical downscaling model was relatively simple [
58]. The downscaling methods used in this study were all statistical. Compared with other current statistical downscaling techniques, including quantile mapping (QM) [
59], tectonic simulation (CA) [
60], spatial decomposition bias correction (BCSD) [
61], and more complex deep learning algorithms [
62,
63,
64,
65,
66,
67], the downscaling model constructed in this study was relatively simple and did not introduce environmental factors as model parameters. Deng et al. [
68] collected field monitoring data in Meiliang Bay and concluded that the key driving factors affecting the Chl-a concentration included the water temperature, light, and simulation time, which were closely related to hydrodynamic processes and meteorological conditions. Shi et al. [
69] also emphasized the impact of meteorological conditions on the Chl-a concentration in Taihu Lake, pointing out that high air temperature and low air pressure levels were closely related to the outbreak of cyanobacteria and that the degree of influence of air temperature on cyanobacteria growth varies in different areas, such as Meiliang Bay and Gonghu Bay. The lack of model-related environmental parameters might be an important reason for the poor performance of this method during rapid environmental changes or concentrated outbreaks of algal blooms.
In addition to the limitations of the research methods, the complex ecological environment of the region itself also increases the possibility of sudden changes in the Chl-a concentration. Against the backdrop of global warming, the frequency and intensity of extreme weather events in coastal areas have also increased significantly, and these events have a significant impact on the entire ecosystem around Taihu Lake. Taihu Lake has a large area and frequent water exchanges with the outside world. Its water bodies are susceptible to environmental factors such as complex wind-driven circulation and input from surrounding rivers. Under extreme conditions, these factors may lead to sudden changes in Chl-a [
70]. In this process, biological processes such as the vertical migration of algae and their specific response to environmental conditions also make the temporal variation pattern more complex [
71]. Under such special conditions, the response mechanism of Chl-a concentration to extreme environmental stress may exceed the training scope of the research model, increasing downscaling errors.
In summary, the Chl-a concentration downscaling method proposed in this study had errors accumulating in multiple links, which may cause uncertainty in the final results. Nevertheless, this study achieved an effective improvement in the spatiotemporal resolution of traditional remote sensing data, and the method had a certain degree of credibility.
5.3. Generalizability of Spatiotemporal Downscaling Model
This study selected COMS-1 GOCI and Sentinel-2 MSI data for downscaling, but the application scope of this method was not limited to these two specific data sources. As long as the data meet the characteristics of complementary temporal and spatial resolutions, this method can be applied for downscaling, for example, the combination of MODIS and Landsat, VIIRS, and Gaofen series. The flexibility of this data source allows different studies to select the most appropriate data combination according to the actual situation of the region and data availability, providing a reliable data foundation for the subsequent downscaling.
In terms of the practical application of the model, the efficiency of information acquisition and decision making was particularly important for water quality monitoring (especially large-scale, real-time monitoring). Therefore, improving the computational efficiency of the model and reducing the computational cost was important for ensuring its wide application in actual monitoring. The downscaling model established in this study based on statistical methods was relatively simple, had a low demand for computing resources, and could quickly obtain downscaling results, providing data support for subsequent analyses.
In addition to Chl-a, the method proposed in this study can be extended to downscale other water quality and meteorological parameters, especially those with obvious daily variation characteristics. In water quality monitoring, it can also be applied to other water quality indicators, such as turbidity, dissolved oxygen (DO), suspended matter concentration, temperature, and pH, to provide more accurate spatiotemporal data for dynamic monitoring of water quality or atmospheric changes.
5.4. Limitations of Spatiotemporal Downscaling Model and Future Work
This study used Sentinel-2 MSI and COMS-1 GOCI to invert Chl-a data for data downscaling. The source data were mature and had a high resolution, which met the monitoring needs of Chl-a in both time and space. However, due to systematic differences in data sources, coupled with the limitations of preprocessing and inversion methods, there were deviations in the inverted data. This study used a precorrection model to improve it, but this deviation could not be completely eliminated.
The random forest model correction method used in this study also has some limitations, mainly reflected in the fact that the correction process is based on the independent processing of pixels, which may cause spatial discontinuity problems due to the lack of consideration of the spatial correlation between adjacent pixels. Additionally, the correction process may introduce new uncertainties, which were transmitted to the subsequent downscaling process and affect the accuracy of the final data.
In the process of data verification, this study compared the daily variation trend of the Chl-a inversion value after downscaling with the measured Chl-a concentration value. The high fitting coefficient and significance became a strong indication of the feasibility of the research method. However, since the inversion model used was constructed under a dataset with a lower time resolution, the influence of fluorescence changes caused by the diurnal cycle at the sub-daily scale on the Chl-a inversion results and the measured results was not considered, which limited the data accuracy of the study to validate the downscaling results.
The downscaling model constructed in this study was relatively simple and relied only on the Chl-a inversion results, without involving relevant hydrological and meteorological parameters, which further limited the ability of the model to capture the distribution and changes in Chl-a.
In view of the current limitations of the research, the team will further optimize the following aspects in future research:
We will introduce satellite data with a higher temporal resolution, such as Japan’s Himawari-9 and China’s Fengyun-4 series, to improve the downscaling effect and quickly capture the dynamic changes in elements.
We plan to construct an inversion model suitable for the study area based on the water characteristics of different lakes, with a particular focus on developing differentiated inversion strategies according to water body zoning and the impact of non-Chl-a factors on the diurnal variation of fluorescence.
We will enhance the overall ability of the model to capture trend changes by considering the introduction of more complex machine learning algorithms, given that the model constructed in this study was relatively simple to apply compared to current downscaling methods.
We intend to expand the input parameters of the model by including hydrological data such as hydrodynamic models. This will improve the physical mechanisms and interpretability of the model, enhance its generalization ability, and lay the foundation for further research on constructing an early warning system for harmful cyanobacterial blooms.
6. Conclusions
Based on high-temporal-resolution COMS-1 GOCI and high-spatial-resolution Sentinel-2 MSI Chl-a inversion data, a spatiotemporal downscaling model based on the machine learning pre-correction method was successfully constructed. The model effectively reduced the data deviation caused by a series of tasks before downscaling through the adaptive bias pre-correction method of the random forest model and increased the R2 of the two data sources to 0.93 while retaining the temporal trend of GOCI. The corrected data were used to perform downscaling based on the RTAD and TWD methods, and the temporal resolution of the Sentinel-2 MSI Chl-a data was successfully increased from 5 d to 1 h. Verified by the measured data, the overall fit of the model was ideal, and the best fit was 0.98. The credibility of the inversion results was further proven by combining the actual cyanobacteria bloom events in Taihu Lake.
After comparative analysis, the overall fitting rate of the TWD method was higher than that of the RTAD method, which had strong independence and was not easily affected by data from other time points. In terms of the already fitted parts, the RTAD method had a higher proportion of good fit and high fit, which can better capture daily trend changes. The downscaling results were more accurate in the data interval, with more obvious trends. For wide water, where the Chl-a value exhibits regular variations, the RTAD method can be used. The TWD method is more suitable for areas such as lakes and bays, where the changing trend is difficult to capture.
The downscaling method constructed in this study successfully obtained high-precision Chl-a monitoring data with a temporal resolution of 1 h and a spatial resolution of 10 m, effectively captured the trend of intraday Chl-a changes, improved data accuracy, and helped in related water environment management and cyanobacteria bloom early warning monitoring. This method can be applied to the downscaling of water quality parameters, such as Chl-a in other waters, has great promotion potential, meets the needs of refined water body monitoring, and provides data support for water environment management.