A Robust Spatiotemporal Fusion Algorithm for Wetland Vegetation Phenology Retrieval in Cloud-Prone Regions

Xie, Tianci; Ai, Jinquan; Xie, Ni; Qiao, Man

doi:10.3390/rs18111832

Open AccessArticle

A Robust Spatiotemporal Fusion Algorithm for Wetland Vegetation Phenology Retrieval in Cloud-Prone Regions

by

Tianci Xie

^*,

Jinquan Ai

,

Ni Xie

and

Man Qiao

School of Surveying and Geoinformation Engineering, East China University of Technology, Nanchang 330013, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(11), 1832; https://doi.org/10.3390/rs18111832

Submission received: 1 April 2026 / Revised: 25 May 2026 / Accepted: 27 May 2026 / Published: 3 June 2026

(This article belongs to the Special Issue High-Throughput Phenotyping in Plants Using Remote Sensing)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

We propose IESTARFM, an improved spatiotemporal fusion framework tailored for cloud-prone wetland environments, enabling the reconstruction of 10 m with 8-day kNDVI time series from multi-source optical observations.
IESTARFM achieves higher fusion fidelity and phenology accuracy than common baselines, and resolves contrasting phenological regimes between Carex spp. (Cs) and Phragmites australis (Pa).

What are the implications of the main findings?

The method provides a practical way to generate temporally continuous, high-resolution phenology products in persistently cloudy and rainy regions, supporting reliable monitoring of wetland vegetation dynamics at management-relevant scales.
The resulting phenology maps can be directly used to study hydrology–vegetation coupling and to evaluate wetland restoration/management actions by distinguishing species-specific phenological responses (e.g., Cs vs. Pa).

Abstract

Vegetation phenology refers to the cyclical growth patterns of vegetation in nature, which are influenced by climatic conditions, human activities, and genetic factors. It plays an irreplaceable role in regulating carbon cycling and energy flow within natural ecosystems. However, the combination of a cloudy and rainy climate with a landscape characterized by the interplay of land and water and fragmented patches has long posed challenges for remote sensing phenological monitoring data, including a scarcity of valid observations, frequent temporal gaps, and spectral distortion in mixed pixels. These issues make it difficult to reliably support the needs of wetland phenological inversion and mapping. To address this issue, this study uses vegetation inversion in the Poyang Lake wetlands as a case study and reconstructs high-spatiotemporal-resolution time-series kNDVI data based on multi-source remote sensing data. Methodologically, we propose an improved and enhanced spatiotemporal adaptive reflectance fusion model, IESTARFM. This model enhances the homogeneity of similar pixel selection through adaptive matching windows and land cover constraints. Additionally, it explicitly incorporates cloud probability and time-lag factors into the weighting structure to systematically downweight unreliable observations, and further employs quadratic term corrections to account for the nonlinear growth response of kNDVI. Using the reconstructed dataset, key phenological information is extracted by combining third-order harmonic analysis with a dynamic thresholding method, thereby enhancing the robust characterization of seasonal trajectories under conditions of missing data and noise. Accuracy evaluation results show that the 10m/8d high-frequency kNDVI dataset reconstructed by IESTARFM achieves at least a 12.61% improvement in fusion accuracy compared to classical methods such as ESTARFM, STARFM, and FSDAF, with a maximum reduction in RMSE of 0.026, and effectively restores details in areas with thin cloud cover. The reconstructed kNDVI series achieved a coefficient of determination

R^{2}

= 0.875 and RMSE = 0.066 relative to Sentinel-2 observations, indicating that the reconstructed series closely reproduces the reference imagery in both amplitude and spatial structure. The phenological parameters derived from kNDVI exhibit an RMSE of 4.81 days compared to field observations, demonstrating that the reconstructed time series reliably captures the timing of key phenological events. It should be noted that the proposed approach is designed for post-event time-series reconstruction and is not intended for real-time forecasting. In summary, this study collaboratively enhanced the reliability of high-resolution index time-series reconstruction and phenological identification in cloudy and rainy wetlands through three key aspects: cloud noise suppression, heterogeneous boundary preservation, and nonlinear growth characterization. It provides a generalizable technical foundation for dynamic monitoring of wetland vegetation, ecological restoration assessment, and refined management in regions with frequent cloud and rainfall.

Keywords:

spatiotemporal image fusion; kNDVI; wetland vegetation; phenology retrieval

Graphical Abstract

1. Introduction

Vegetation phenology is one of the most sensitive biological indicators of terrestrial ecosystem responses to global climate change [1] and a key factor regulating the terrestrial carbon cycle [2]. Therefore, high-accuracy monitoring and retrieval of vegetation phenology are essential for elucidating the response and feedback mechanisms of terrestrial ecosystems to climate change and for improving the reliability of carbon cycle process modeling.

Traditional phenology monitoring mainly relies on field plot surveys and manual, periodic observations [3], in which changes in plant developmental stages are recorded to derive key parameters such as the start of the growing season (SOS), end of the growing season (EOS), and length of season (LOS). Although this approach can accurately represent local physiological processes, its application at large spatial and long temporal scales is constrained by labor and time costs and by sparse sampling. In wetlands, long-term inundation, restricted access, and safety concerns further limit field observations, weakening both their operational feasibility and representativeness. To overcome the limitations of traditional surveys in terms of continuity and automation, near-surface observation systems have been developed along several fixed, automatic, and high-frequency pathways. For example, phenological cameras (PhenoCam) continuously record canopy appearance and seasonal transitions using time-lapse imagery from fixed viewpoints [4]; eddy covariance flux networks characterize ecosystem functional rhythms from the perspective of energy and mass exchange [5]; and unmanned aerial vehicle platforms provide high spatial resolution surface information over small areas to complement and refine ground-based samples [6]. Although these near-surface approaches still share the common constraint of limited spatial coverage, they provide high-frequency, calibratable process evidence that plays a crucial role in bridging ground-based observations and satellite-based phenology retrieval.

Satellite remote sensing, with its non-contact, wide-area coverage and long-term continuous observation capability, has become the core means of obtaining phenology information from regional to global scales. With the maturation of Earth observation systems, land surface phenology products with spatial resolutions of 250–1000 m have been widely applied [7]. To better represent strongly heterogeneous surfaces, long time series of vegetation indices or biophysical variables have gradually become mainstream, and by fusing multispectral imagery from sensors such as Landsat 8 and Sentinel-2, it is now possible to generate spatiotemporally continuous phenology products at 10–30 m resolution with revisit intervals of about 3 days, thereby greatly enhancing the ability to resolve complex phenological dynamics at the land surface [8]. However, remotely sensed phenology retrieval is highly sensitive to data quality and model assumptions. Cloud contamination, time-series noise, cross-sensor differences, and scale effects can all introduce uncertainty and reduce the accuracy of phenological transition detection.

In wetland ecosystems, these challenges are even more pronounced. Wetlands are characterized by highly diverse vegetation types and fragmented spatial distribution [9], while rapid changes in land–water ecotones tend to cause mixed and abrupt remote sensing signals, which interfere with the stable extraction of phenological information. In addition, wetlands are often located in cloud-prone, rainy climatic regions, where usable optical observations are substantially reduced, and time series frequently exhibit large gaps and discontinuities, making it difficult to meet the requirements of large-scale, long-term phenology monitoring. Satellite remote sensing remains the primary tool for obtaining large-scale phenology information [10], yet optical imagery in wetland monitoring still suffers from severe data scarcity [11,12,13,14]. In regions with frequent cloud and rain, medium- to high-resolution sensors rarely form stable sequences of valid observations, further weakening the ability to capture rapid vegetation changes [15,16]. Although sensors such as MODIS offer high revisit frequency, their coarse spatial resolution (250 m–1 km) is inadequate for characterizing fragmented and highly heterogeneous wetland landscapes. To alleviate the trade-off between spatial and temporal resolution, spatiotemporal image fusion (STF) techniques have been widely adopted [17,18,19], aiming to combine the advantages of fine spatial resolution and high temporal frequency. Classical models such as STARFM [20], ESTARFM [21], and FSDAF [22] perform well over relatively homogeneous surfaces, but in regions with strong land-cover variability, such as wetland margins and land–water transition zones, they often struggle to reconstruct sharp spectral changes and to adequately account for the uncertainty of cloud-contaminated observations, thereby compromising the realism of the fused series and the stability of phenology identification. In wetland regions with highly heterogeneous water–vegetation mixtures, kNDVI is expected to better capture the vegetation signal because its nonlinear kernel transformation reduces saturation effects and mitigates water background interference, providing a more accurate representation of vegetation dynamics in mixed pixels.

Beyond limitations in spatiotemporal resolution, the suitability of the vegetation index (VI) itself also constrains phenology retrieval accuracy [23,24,25,26]. Traditional indices such as NDVI are prone to saturation in high-biomass regions and are highly sensitive to backgrounds such as water and mudflats [27], a problem that is particularly acute in wetlands where background conditions change rapidly. To overcome the limitations of linear indices, Camps-Valls et al. [28] proposed the kernel Normalized Difference Vegetation Index (kNDVI), which uses a radial basis function (RBF) kernel to map spectral information into a high-dimensional Hilbert space, thereby more effectively capturing the nonlinear relationship between near-infrared and red bands. Previous studies have shown that kNDVI outperforms NDVI, EVI, and NIRv in resistance to saturation, sensitivity to gross primary production (GPP), and robustness to noise [28], and it has been applied to dynamic monitoring and conservation-effectiveness evaluation across multiple ecosystems [29,30,31,32]. However, for highly dynamic wetlands, which are typical of cloud–rainy climates with strong background mixing and rapid land-cover transitions, there is still a lack of systematic technical frameworks and targeted validation for generating high-quality long time series of kNDVI and applying them to phenology retrieval. In particular, in wetland regions with strongly mixed water–vegetation pixels, kNDVI provides superior performance because its nonlinear kernel mapping reduces background interference from water and mudflats and better captures vegetation spectral responses related to canopy structure and photosynthetic activity. Empirical studies in wetlands have confirmed that kNDVI yields more accurate vegetation dynamics and phenology retrieval compared to linear indices in such heterogeneous environments [33,34,35].

Although existing spatiotemporal fusion methods have substantially improved the temporal continuity of medium- and high-resolution optical imagery, a clear methodological gap remains for phenology retrieval in cloud-prone wetland environments. First, residual clouds and shadows are usually treated mainly during preprocessing, whereas their uncertainty is rarely incorporated into the fusion weighting process. Second, fixed or weakly adaptive similar-pixel selection strategies are not well suited to fragmented wetland landscapes, where water, mudflats, and vegetation patches are spatially interlaced and may lead to mixed-pixel errors along land–water boundaries. Third, most existing fusion methods rely on approximately linear reflectance or index conversion assumptions, which may be insufficient for representing nonlinear kNDVI changes during rapid green-up, senescence, or hydrologically driven vegetation transitions. These gaps directly motivate the methodological improvements proposed in this study, including cloud-probability and temporal-distance weighting, adaptive matching windows constrained by land-cover information, and a quadratic correction term for nonlinear kNDVI reconstruction.

To address the above issues, this study proposes an improved IESTARFM-based fusion scheme on the Google Earth Engine (GEE) platform for monitoring highly dynamic wetlands and uses it to construct high-frequency kNDVI time series for phenology retrieval. Methodologically, we extend the ESTARFM framework by introducing cloud-probability and temporal-proximity weighting factors to automatically identify and suppress unreliable pixels in reference images, thereby reducing the influence of residual cloud noise on the predicted series. In response to wetland landscape fragmentation and rapid land–water transitions, we incorporate a classification-driven fusion strategy, in which an adaptive spatial window and land-cover masks constrain the range of similar pixels to enhance spectral consistency and boundary preservation in land–water transition zones. Considering the nonlinear response characteristics of kNDVI, we further introduce a quadratic correction term to more finely describe rapid green-up and senescence processes and to improve the detection of key phenological events such as SOS and EOS. Using the Poyang Lake wetland as the study area, we reconstruct a 10 m, 8-day kNDVI dataset, evaluate the effectiveness of the improved scheme against ESTARFM, STARFM, FSDAF, and original Sentinel-2 observations, and finally combine Harmonic Analysis of Time Series (HANTS) with a dynamic threshold method to generate wetland vegetation phenology maps. This work provides a transferable technical framework for high-precision data reconstruction and phenology monitoring in cloudy–rainy wetlands.

2. Materials and Methods

2.1. Study Area

Taking the retrieval of wetland vegetation phenology in the Poyang Lake region as a case study, this paper investigates how different spatiotemporal fusion algorithms affect phenology retrieval in cloudy–rainy wetlands. Poyang Lake is located in northern Jiangxi Province on the southern bank of the middle reaches of the Yangtze River, with geographic coordinates ranging from 115°49′E to 116°46′E and 28°24′N to 29°46′N (Figure 1). The Poyang Lake wetland is an important stopover and breeding site for migratory birds and is typically classified as a flow-through and river-connected lake, whose natural hydrological linkage with the Yangtze River provides the key background for wetland formation and evolution. The region is situated in a subtropical monsoon climate zone, characterized by a mild, humid climate and abundant sunshine. The multi-year mean air temperature ranges between 16.5 °C and 17.8 °C, displaying an overall pattern of higher temperatures in the south and lower in the north, with a north–south temperature difference of about 1 °C [36]. The long-term mean annual precipitation over the lake area is 1542 mm; driven by the monsoon, precipitation exhibits pronounced spatiotemporal heterogeneity. Intra-annually, rainfall is highly unevenly distributed, with approximately 69.4% of the annual total occurring from April to September, while spatially it decreases gradually from the southeast toward the northwest [36].

2.2. Source of Data

All remote sensing data used in this study were obtained from the Google Earth Engine (GEE) platform. The study period spans from 1 January 2024 to 31 December 2024. High-temporal- and high-spatial-resolution optical data were jointly used to construct a high-frequency kNDVI time series for the Poyang Lake wetland to support phenology retrieval. As the high-temporal-resolution input, we used the MODIS Terra Surface Reflectance 8-Day Composite product MOD09A1 (Collection 6.1), which provides 500 m surface reflectance for MODIS bands 1–7 together with quality layers, thus offering representative observations for each 8-day compositing period and supporting pixel-level screening based on quality information. As the high-spatial-resolution input, we used the Harmonized Sentinel-2 MSI Level-2A surface reflectance dataset, which provides multispectral observations at 10–60 m spatial resolution with a nominal revisit interval of about 5 days, thereby supplying finer spatial structural constraints.

Cloud and snow/ice contamination were removed by jointly masking the Sentinel-2 QA60 band and the Scene Classification Layer (SCL), in order to reduce the impact of residual cloud noise on the index time series under cloudy conditions. To ensure temporal consistency between sensors, Sentinel-2 observations were composited at 8-day intervals to match the MOD09A1 period. The MODIS MOD09A1 images were resampled to the Sentinel-2 10 m grid only for spatial registration and pixel-wise implementation of the spatiotemporal fusion algorithm. This operation should not be interpreted as generating true 10 m spatial details from MODIS observations. Similar preprocessing procedures have been widely adopted in established spatiotemporal fusion models, including STARFM, ESTARFM, and FSDAF, where coarse-resolution images are first aligned with the fine-resolution grid before fusion. In the proposed framework, MODIS mainly provides temporally continuous variation information, whereas Sentinel-2 constrains the fine-scale spatial structure. Bilinear interpolation was selected to reduce blocky discontinuities associated with nearest-neighbor resampling while preserving the smooth temporal signal of coarse-resolution MODIS observations. Vegetation indices were calculated from red and near-infrared reflectance, using bands B4 and B8 for Sentinel-2 and bands 1 and 2 for MODIS. On this basis, kNDVI time series were generated and subsequently used for phenological parameter retrieval.

Because the input optical images were pre-screened before fusion, this study mainly evaluates the reconstruction performance under moderate residual cloud contamination rather than under the full range of cloudiness conditions. Sentinel-2 scenes with cloud coverage greater than 25% were excluded, and remaining cloud-affected pixels were further masked using QA60 and SCL. MODIS MOD09A1 observations were also screened using quality information and used as temporally continuous coarse-resolution reference data. Therefore, the proposed IESTARFM framework is primarily designed to reduce the influence of residual cloud noise and short-term observation gaps within pre-filtered optical time series, rather than to reconstruct fine-scale vegetation dynamics under persistent cloud cover where valid Sentinel-2 observations are almost completely absent.

2.3. Proposed Algorithm for Spatio-Temporal Fusion of Remote Sensing Images

To address the coexistence of long-term data gaps in medium- to high-resolution optical imagery and strong landscape fragmentation in cloudy–rainy wetlands—which leads to temporal discontinuities, difficulty in preserving sharp boundaries, and easy propagation of residual cloud noise—this study proposes a spatiotemporal fusion algorithm tailored for wetland vegetation phenology retrieval in such regions. The algorithm fuses high-temporal-resolution, coarse-scale images with low-temporal-resolution, fine-scale images to generate continuous reflectance and index series with both high spatial and temporal resolution.

The core innovations of the proposed IESTARFM scheme are as follows: (i) cloud probability and temporal proximity are explicitly incorporated into the weight construction to systematically down-weight unreliable pixels and mitigate the propagation of residual cloud noise; (ii) an adaptive spatial window combined with land-cover-based constraints is used to improve the homogeneity of similar-pixel selection in land–water transition zones and to enhance boundary preservation; and (iii) a quadratic correction term is introduced into the fusion formulation to account for the nonlinear response of kNDVI, thereby improving shape fidelity during rapidly changing phases of the growing season and stabilizing the detection of key phenological events.

2.3.1. Adaptive Matching Window

In the traditional ESTARFM framework, a fixed-size moving window is used to search for similar pixels. In regions with frequent cloud cover or highly fragmented land-cover types, such a fixed window may be too small to locate a sufficient number of cloud-free similar pixels. To address this, we adopt an adaptive window strategy that starts from a relatively small window. If the number of similar pixels does not reach a predefined threshold

N

, the window size is gradually enlarged until enough candidate pixels are found. As the window expands, only pixels that are spatially close and exhibit small spectral differences are admitted into the candidate set, in order to avoid introducing heterogeneous land-cover types from an overly large neighborhood. This adaptive design ensures that even under locally extensive cloud cover, clear pixels over similar land-cover can still be identified within a larger neighborhood to support prediction.

In this study, the local heterogeneity index H was defined as the coefficient of variation of valid Sentinel-2 kNDVI values within the initial local window. For a target pixel x, H was calculated as:

H (x) = σ Ω (x) / (μ Ω (x) + ε)

where

σ Ω (x)

and

μ Ω (x)

represent the standard deviation and mean value of valid kNDVI pixels within the initial local window

Ω (x)

, respectively, and ε is a small constant used to avoid division by zero. Only pixels that passed the cloud mask and land-cover consistency constraint were included in the calculation. A larger H indicates stronger local spatial heterogeneity, such as fragmented land–water boundaries or mixed vegetation–mudflat pixels, whereas a smaller

H

indicates a relatively homogeneous surface.

The values of

r_{\min}

,

r_{\max}

, and α were determined according to the 10 m spatial resolution of Sentinel-2, the fragmented land–water pattern of the Poyang Lake wetland, and the need to balance boundary preservation with the availability of sufficient similar pixels. Specifically,

r_{m i n}

was used to prevent the search window from crossing sharply contrasting land-cover boundaries in highly heterogeneous areas, while

r_{m a x}

was used to ensure that enough candidate pixels could be obtained in relatively homogeneous or locally cloud-affected regions. The parameter α controls the rate at which the window radius decreases with increasing heterogeneity.

The window radius

r

is defined as a decreasing function of a local heterogeneity index

H

:

r = {r ({r m i n}_{m a x} \exp (- α H),)}_{m i n}

where

r_{m i n}

and

r_{m a x}

denote the minimum and maximum candidate window radius, α controls the strength of the adjustment, and H is the local heterogeneity metric. When local heterogeneity is high, the window converges toward

r_{m i n}

and focuses on more homogeneous areas around the target pixel; when heterogeneity is low, the window approaches

r_{m a x}

and can be enlarged to gather more samples. In this way, the window automatically shrinks in land–water transition zones to avoid spanning sharply contrasting land-cover types, whereas over extensive grassland or mudflat areas it can expand to incorporate a sufficient number of pixels.

For each target pixel, at least N similar pixels are required for prediction. If fewer than N similar pixels are found within the initial window, the window is progressively expanded until the requirement is satisfied; if a large number of similar pixels are retrieved but they include multiple land-cover types, the window is reduced or the similarity threshold is tightened. This procedure effectively adjusts the window on demand, balancing prediction accuracy and statistical stability.

Let R and

r_{0}

denote the upper and lower bounds of the window radius, respectively, and let

H_{0}

be a heterogeneity threshold. When the local heterogeneity index

H > H_{0}

, the minimum window

r = r_{0}

is used; when H is very small, the maximum window

r = R

is adopted; for intermediate values, the window radius is computed by linear interpolation:

r = r_{0} + (R - r_{0}) (1 - \min (H / H_{0}, 1)) .

Through this adaptive matching-window strategy, unsuitable similar pixels are substantially excluded, thereby reducing prediction noise. In the Poyang Lake wetland, the dynamic window helps avoid mixing water and land pixels and enhances the robustness of the model in highly heterogeneous environments.

2.3.2. Cloud-Probability and Temporal-Distance Weighting Factors

The original ESTARFM weight function mainly considers spectral difference and spatial distance, and lacks explicit treatment of cloud cover and temporal distance. In cloudy regions, residual clouds or shadows may render some neighborhood pixels unreliable, yet the original algorithm may still assign them relatively high weights. In addition, ESTARFM finally fuses the two base date predictions by simple interpolation and does not fully exploit the temporal information contained in multiple observations. Since the presence of clouds introduces observation uncertainty, it should be suppressed through the fusion weights. Therefore, in the IESTARFM scheme, two additional factors are introduced into the weight calculation, namely a temporal distance weight and a cloud probability weight, so that cloudy pixels and pixels that are far in time from the prediction date receive lower weights and the influence of unreliable data on the fused results is reduced.

Using cloud masks or cloud probability information, each candidate similar pixel is assigned a coefficient that reflects its degree of clearness. For a pixel

(x_{i j}, t_{k})

, this coefficient is defined as

C_{i j, k} = 1 - P_{cloud} (x_{i j}, t_{k}),

where

P_{cloud}

is the probability of being cloud free or a cloud mask indicator, with 1 for cloud free pixels and 0 for cloudy pixels. Cloud free pixels thus obtain a coefficient close to 1, whereas cloudy pixels receive 0 or a very small value. This coefficient is directly multiplied with the weight so that cloudy pixels are down weighted. If a similar pixel is cloud covered in the base image, its effective weight is greatly reduced and cloud-contaminated values are prevented from interfering with the prediction:

W_{i j}^{'} = C_{i j, k} \cdot W_{i j, k},

where

W_{i j, k}

denotes the original ESTARFM weight function.

To account for the fact that the two base images may have different temporal distances to the prediction time, and that each observation has different representativeness for the prediction time when multiple dates are used, a temporal distance factor is further introduced into the weights. The temporal difference is defined as

Δ t_{k} = ∣ t_{p} - t_{k} ∣,

and the corresponding temporal distance weight factor is

T_{k} = \exp (- (\frac{Δ t_{k}}{\sqrt{2} τ})^{2}),

where

τ

is a tuning constant. When only two base dates are available, this factor effectively acts on the final fusion weights, assigning a larger weight to the date closer to the prediction time. In this study, τ was set to 16 days, corresponding to two 8-day compositing intervals, so that observations close to the prediction date were given higher weights while nearby valid observations could still contribute to the fusion. When multiple images are involved in the fusion, the temporal factor can also be directly multiplied with the weight of each pixel pair:

W_{i j}^{″} = T_{k} \cdot W_{i j, k}^{'},

so that the influence of temporally distant data is reduced already at the neighborhood weighting stage.

The original ESTARFM weight function can be written as

W_{i j, k} = \exp (- (\frac{D_{s p e c}}{σ_{s}})^{2}) \cdot \exp (- (\frac{D_{s p a}}{σ_{d}})^{2}),

where

D_{s p e c}

denotes the spectral difference between the similar pixel and the central pixel at the base date,

D_{s p a}

is the spatial distance, and

σ_{s}

,

σ_{d}

are the tuning parameters. After introducing the two new factors, the complete improved weight function can be written as follows:

W_{i j, k}^{(n e w)} = \underset{o r i g i n a l s p e c t r a l - s p a t i a l w e i g h t}{\underset{⏟}{\exp (- \frac{D_{s p e c}^{2}}{2 σ_{s}^{2}}) \cdot \exp (- \frac{D_{s p a}^{2}}{2 σ_{d}^{2}})}} \times \underset{c l o u d p r o b a b i l i t y f a c t o r}{\underset{⏟}{(1 - P_{i j, k})}} \times \underset{t e m p o r a l f a c t o r}{\underset{⏟}{\exp (- \frac{Δ t_{k}^{2}}{2 τ^{2}})}}

is normalized as follows:

W_{i j, k}^{norm} = \frac{W_{i j, k}^{(new)}}{\sum_{m, n} W_{m n, k}^{(new)}} .

With this modification, pixels with a high probability of cloud obstruction do not obtain large weights even if they are spectrally similar, while neighboring observations that are close to the prediction date are given higher weights. This optimization improves the noise robustness of the model under cloud affected conditions and makes fuller use of time adjacent observations to increase prediction accuracy.

2.3.3. Method for Constructing kNDVI Time Series

In this study, the NDVI value of each pixel is first transformed into kNDVI through a kernel function

ϕ (\cdot)

. A linear fusion model is then applied in the kNDVI space to perform prediction, and the predicted results are finally mapped back to NDVI. Pan et al. [37] proposed a quadratic polynomial kernel that represents NDVI variation using NDVI and its squared term. Experimental results have shown that this approach can be effectively integrated into existing models.

To more accurately characterize the nonlinear response relationships among pixels, a quadratic term is introduced into the conversion coefficient model of IESTARFM, extending the original linear conversion formulation to a quadratic polynomial relationship, which can be expressed as follows:

L_{p r e d} (t_{p}) = L (t_{1}) + a [M (t_{p}) - M (t_{1})] + b {[M (t_{p}) - M (t_{1})]}^{2}

where

a

and

b

are coefficients to be determined, with

a

representing the linear term and

b

the quadratic term. Since this formulation includes the squared term of the coarse resolution change, it allows the predicted value to accelerate or decelerate relative to the coarse scale variation, and can fit curve shapes that cannot be captured by a simple linear model. For NDVI describing the growth of highly dynamic lake wetland vegetation, the quadratic term helps to capture the curvature of its temporal trajectory.

Since only two base images are available, it is difficult to directly determine the two parameters of the quadratic curve. In this study, the sample size is increased in the spatial dimension. Within a local window, if the NDVI of different similar pixels shows a similar quadratic relationship with MODIS NDVI, all pixel pairs

(L_{i j} (t_{1}), L_{i j} (t_{2}))

and the corresponding coarse-resolution NDVI values

(M_{i j} (t_{1}), M_{i j} (t_{2}))

at times

t_{1}

and

t_{2}

are collected, and a quadratic function of

L (t_{2}) - L (t_{1})

with respect to

M (t_{2}) - M (t_{1})

is fitted using least squares regression to estimate the coefficients

a

and

b

shared by the entire window. This approach is equivalent to assuming that all pixels within the window follow the same NDVI change curve, while individual pixels may occupy different positions along this curve, and the fitted curve can then be applied to the prediction of all pixels in the window.

The coefficients a and b were estimated locally using least-squares regression within the same adaptive matching window had been defined. Therefore, the fitting window was not an additional fixed-size window, but followed the adaptive radius r, which was constrained by

r_{m i n}

and

r_{m a x}

. For each target pixel, only cloud-free candidate pixels satisfying the land-cover consistency constraint and similar-pixel selection criteria were used for coefficient fitting.

Specifically, for each valid similar pixel i, the fine-resolution kNDVI change and the corresponding coarse-resolution kNDVI change between two base dates were calculated as

Δ L_{i} = L_{i} (t_{2}) - L_{i} (t_{1})

and

Δ M_{i} = M_{i} (t_{2}) - M_{i} (t_{1})

, respectively. The local quadratic relationship was then expressed as

Δ L_{i} = a Δ M_{i} + b (Δ M_{i})^{2} + ε_{i}

, where a and b were solved by least-squares fitting using all valid similar pixels in the adaptive window. In matrix form, this can be written as

β = (a, b)^{T} = (X^{T} X)^{- 1} X^{T} y

, where

X = [Δ M_{i}, Δ M_{i})^{2}

and

y = [Δ L_{i}]

. The fitted coefficients were then applied to predict the fine-resolution kNDVI change at the target date.

To avoid unstable fitting, the quadratic correction was applied only when a sufficient number of valid similar pixels were available within the adaptive window and when the fitted relationship was numerically stable. If the number of valid samples was insufficient, if the coarse-resolution change was too small to support stable quadratic fitting, or if the fitted coefficients produced unreasonable kNDVI values, the model reverted to the linear correction form by setting the quadratic term to zero. The predicted kNDVI values were also constrained within the physically meaningful range of the vegetation index. These boundary conditions were used to prevent overcorrection in highly heterogeneous or poorly observed regions.

This method can partially correct the systematic bias of a linear model. When the actual NDVI change is smaller than the linear prediction, a negative quadratic coefficient (b < 0) makes the predicted curve concave downward and reduces the predicted values. Conversely, when the actual change exceeds the linear prediction, a positive coefficient (b > 0) makes the curve convex upward and increases the predicted values, so that the model better matches the true NDVI dynamics. After introducing the nonlinear term, the correlation between the generated high resolution NDVI time series and the actual observations is significantly enhanced and the mean squared error is reduced. Therefore, in highly dynamic lake wetland regions, combining the kNDVI framework with quadratic correction effectively captures the nonlinear variation characteristics of vegetation indices, improves the realism and stability of time series reconstruction, and ultimately supports the construction of a kNDVI dataset for the Poyang Lake wetland for 2024 with an 8 day temporal resolution and a 10 m spatial resolution.

2.4. Method for Wetland Vegetation Phenology Retrieval

After obtaining the high spatiotemporal resolution kNDVI time series, key phenological parameters were extracted by combining harmonic analysis with a dynamic threshold method. Given that vegetation in the Poyang Lake wetland is strongly influenced by hydrological conditions and often exhibits complex growth trajectories, traditional smoothing approaches such as Savitzky–Golay filtering are generally insufficient to capture its rapid-growth features. Therefore, a phenology retrieval framework based on HANTS was constructed in this study.

2.4.1. Time Series Reconstruction Based on HANTS

HANTS is a classical signal processing method based on Fourier transform that decomposes a discrete time series into a superposition of sine and cosine waves at different frequencies [38]. Compared with other filtering algorithms, HANTS has clear advantages in filling data gaps and removing nonperiodic noise, and is particularly suitable for fitting wetland vegetation growth curves with pronounced seasonal patterns.

The basic formulation of the HANTS model is given by:

f (t) = c_{0} + \sum_{k = 1}^{N} [a_{k} \cos (2 π k t) + b_{k} \sin (2 π k t)] + ε_{t}

where

f (t)

is the fitted kNDVI value at time

t

,

c_{0}

is the mean of the time series,

N

is the number of harmonics,

a_{k}

and

b_{k}

are the cosine and sine amplitude coefficients of the

k

-th harmonic, respectively, and

ε_{t}

is the residual term.

The reconstruction accuracy is highly sensitive to the choice of the harmonic order

N

. If the order is too low, the model tends to be underfitted and cannot effectively reproduce short term details of vegetation growth. Conversely, an excessively high order amplifies observational noise and leads to overfitting. To determine the optimal harmonic order, this study adopts the Akaike Information Criterion (AIC) for adaptive selection. By balancing goodness of fit and model complexity, the AIC criterion can automatically determine the best order for each pixelwise time series, and is given by:

A I C = n \ln (\frac{S S E}{n}) + 2 (2 N + 1)

where

n

is the number of observations and

S S E

is the sum of squared residuals. By minimizing the AIC value, the model can adaptively adjust the fitted curve for different vegetation types, thereby generating smooth and faithful continuous kNDVI time series.

2.4.2. Extraction of Wetland Vegetation Phenological Parameters

Based on the reconstructed kNDVI time series, this study uses the Dynamic Threshold Method to extract key phenological events. The dynamic threshold approach, also known as the seasonal amplitude (SA) method, was proposed by Song et al. [39]. Because wetland vegetation communities have complex structures and large regional differences in peak biomass, a fixed threshold method is difficult to apply. The dynamic threshold method determines phenological phases using relative amplitude, which effectively reduces the influence of background soil and differences in maximum biomass.

First, the HANTS fitted curve is used to determine the maximum value

k N D V I_{m a x}

and the baseline value

k N D V I_{m i n}

of the annual growth cycle and to construct a relative change rate series. The calculation is given by:

k N D V I_{r a t i o} = \frac{k N D V I_{t} - k N D V I_{m i n}}{k N D V I_{m a x} - k N D V I_{m i n}}

where

k N D V I_{t}

is the value at the current time step, and

k N D V I_{m a x}

and

k N D V I_{m i n}

are the maximum and minimum values within the growing season, respectively.

According to the recommendations of the proponents of the Dynamic Threshold Method and previous studies [40], the decision thresholds for the start of season (SOS) and end of season (EOS) are set to 20 percent and 50 percent of the seasonal amplitude, respectively, which yields results that are consistent with observed conditions. In addition, to eliminate pseudo phenological signals caused by short term flooding or residual cloud contamination, a logical constraint based on the length of season (LOS) is introduced. At the pixel level, anomalous pixels with LOS shorter than 30 days are removed, and the same procedure described in this section is applied to extract pixel scale phenological parameters. Finally, a kNDVI based phenology map for the Poyang Lake wetland in 2024 is produced, as shown in Figure 2.

2.5. In-Situ Phenological Observations and Validation Sample Design

In-situ phenological observations were collected in the Poyang Lake wetland during the 2024 growing season to provide independent reference dates for validating remotely sensed phenological retrievals. The field dataset included 60 validation sites, consisting of 30 sites dominated by Carex spp. (Cs) and 30 sites dominated by P. australis (Pa). These sites were distributed across the main wetland vegetation zones and land–water transition areas in the study region. Due to space limitations, only representative field survey sites are presented in the manuscript. Specifically, Table 1 lists 10 representative validation sites for Cs, and Table 2 lists 10 representative validation sites for Pa, including their geographic coordinates, vegetation categories, and observation information. These representative sites are provided to illustrate the spatial coverage and field sampling design of the in-situ observations, while the complete set of 60 validation sites was used for accuracy assessment.

Field surveys were conducted during the key phenological periods of the two dominant wetland vegetation types, including the green-up, peak-growth, and senescence stages. For Cs, observations covered both the first and second growing seasons, whereas for Pa, observations focused on its single annual growing cycle. At each site, the dominant species, vegetation coverage, growth stage, and phenological status were recorded following a consistent field observation protocol. The start of season (SOS) was identified as the date when visible green-up and rapid leaf expansion became dominant within the plot, while the end of season (EOS) was identified as the date when widespread senescence or canopy browning was observed. When multiple observers participated in the field campaign, records were cross-checked after each survey to reduce observer-related uncertainty, and inconsistent records were resolved through joint inspection of field notes and photographs.

To compare point-based field observations with pixel-based remote sensing retrievals, a fixed spatial neighborhood with a radius of 15 m around each validation site was used to extract representative remotely sensed phenological values. This radius was selected because the field coordinates were accurately recorded and because a 15 m radius approximately covers the central Sentinel-2 pixel and its immediate neighboring pixels, thereby reducing single-pixel noise while minimizing the inclusion of heterogeneous land-cover types. Within each neighborhood, only pixels belonging to the same vegetation class as the field site were retained using an independent land-cover mask, and the median phenological date of the retained pixels was used for comparison. The land-cover mask used for neighborhood filtering was generated independently from the field phenological validation samples and was not used for model calibration, fusion training, or accuracy adjustment; it was used only to reduce class mismatch during pixel extraction.

2.6. Accuracy Assessment

To evaluate the reliability of the remotely sensed phenology retrieval, independent reference phenological dates were used to test the temporal consistency of the retrieved start of season (SOS) and end of season (EOS), and day level error statistics were adopted as the main accuracy metrics. The reference phenological dates were derived from field observations. Both the reference dates and the remotely sensed results were converted to day of year (DOY) and paired according to sample location. Considering the scale mismatch between point observations and pixel based estimates, as well as the strong heterogeneity of wetland landscapes, a fixed spatial neighborhood was constructed around each validation sample to extract a representative remote sensing based phenology value. Land cover consistency constraints were further applied within this neighborhood in order to reduce the influence of mixed pixels on the error statistics.

Let

P_{i}^{r e f}

and

P_{i}^{r s}

denote the reference and remotely sensed phenological dates of the

i

-th sample respectively, where

P

can represent either SOS or EOS, and let

N

be the total number of samples. The day scale error is defined as follows:

e_{i} = P_{i}^{r s} - P_{i}^{r e f}

and the Mean Error or bias (ME), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE) are calculated as follows:

ME = \frac{1}{N} \sum_{i = 1}^{N} e_{i}

MAE = \frac{1}{N} \sum_{i = 1}^{N} | e_{i} |

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} e_{i}^{2}}

In addition, to characterize the level of agreement, the correlation coefficient

R

and coefficient of determination

R^{2}

between the retrieved and reference dates are calculated as:

R = \frac{\sum_{i = 1}^{N} (P_{i}^{r s} - \bar{P^{r s}}) (P_{i}^{r e f} - \bar{P^{r e f}})}{\sqrt{\sum_{i = 1}^{N} (P_{i}^{r s} - \bar{P^{r s}})^{2}} \sqrt{\sum_{i = 1}^{N} (P_{i}^{r e f} - \bar{P^{r e f}})^{2}}}

R^{2} = 1 - \frac{\sum_{i = 1}^{N} (P_{i}^{r s} - P_{i}^{r e f})^{2}}{\sum_{i = 1}^{N} (P_{i}^{r e f} - \bar{P^{r e f}})^{2}}

3. Results

3.1. IESTARFM’s Capabilities in Removing Thin Clouds and Its Spatio-Temporal Fusion Reconstruction Performance

For the representative prediction date of 31 October 2024, the consistency between the kNDVI predicted by the four spatiotemporal fusion models and the kNDVI observed by Sentinel 2 for 20,000 co located samples differs markedly (Figure 3). Overall, the scatter distribution of IESTARFM lies closest to the 1:1 reference line and exhibits the most compact high density region, indicating a more stable characterization of pixel scale amplitude variations. The corresponding coefficient of determination

R^{2}

is 0.875 and the RMSE is 0.066, which is the best performance among the four models. In contrast, although ESTARFM maintains a certain level of correlation, its scatter still shows a systematic deviation from the 1:1 line, implying substantial error accumulation in rapidly changing wetland areas and under mixed pixel conditions. For STARFM and FSDAF, the scatter becomes much more dispersed, with the high density region spreading out and deviating further from the reference line, which reflects their insufficient ability to handle spectral discontinuities and abrupt changes in highly dynamic wetland environments and their difficulty in meeting the accuracy requirements of high fidelity time series reconstruction.

In addition to the overall scatter-based validation shown in Figure 3, two representative regions were selected to further evaluate local reconstruction performance under different wetland observation challenges. Region 1 was characterized by extensive valid-observation gaps caused by dense cloud masking, whereas Region 2 was mainly affected by thin cloud contamination and complex land–water mixtures. The four algorithms, namely IESTARFM, ESTARFM, STARFM, and FSDAF, were used to fuse Sentinel-2 and MODIS images acquired under different temporal conditions to generate the kNDVI image for 13 March 2024, which was then compared with the corresponding Sentinel-2 observation.

As shown in Figure 4, in the analysis of the fusion results for Region 1, Figure 4a displays the raw Sentinel-2 image following masking due to dense cloud cover; there are large areas of valid observation gaps in the lower-central region. Without reconstruction, this would result in breaks in the subsequent time-series curves during critical growth stages. Figure 4b displays the corresponding cloud-free MODIS observation for the same time period. Although this image has a lower spatial resolution, it provides stable temporal phase information and constraints on the magnitude of change, thereby laying the foundation for characterising the trends in missing time-series data. Based on this, Figure 4c presents the fusion results of the IESTARFM developed in this study. Its core mechanism lies in utilising MODIS’s high-temporal-resolution information to constrain the magnitude of change on the target date, whilst employing the high-spatial-structural details provided by Sentinel-2 on the reference date as the primary basis for spatial interpolation. This enables information to be supplemented in the missing areas whilst preserving spatial texture and boundary continuity at the 10-metre scale as far as possible. By comparison, the ESTARFM shown in Figure 4d still exhibits some structural discontinuity in edge connections under extensive data gaps, whilst the STARFM shown in Figure 4e, dominated by its neighbourhood weighting mechanism, is more prone to producing patchy effects and excessive boundary smoothing. Although the FSDAF shown in Figure 4f can retain the spatial framework to some extent, prediction errors may still occur in local areas, thereby introducing additional uncertainty into subsequent phenological parameter estimates.

As shown in Figure 5, the Sentinel-2 image of another area depicted in Figure 5a exhibits a more complex land cover pattern, and the left half is significantly affected by thin cloud cover. This results in severe greying of the remote sensing image, making it difficult to distinguish the boundaries between vegetated and non-vegetated areas. The residual thin cloud cover weakens the spectral contrast between the vegetation and the background mudflats, directly affecting temporal consistency. The cloud-free MODIS image in Figure 5b provides relatively stable temporal constraints, enabling a reasonable basis for phase determination to be maintained even under conditions of thin cloud contamination or degraded observation quality. The IESTARFM shown in Figure 5c minimises the impact of residual thin clouds on the reconstructed sequence by downweighting unreliable observations and applying temporal proximity constraints, whilst preserving the morphology of fragmented patches and boundaries in the land–water interface as far as possible. The ESTARFM model in Figure 5d has a weaker ability to distinguish residual cloud noise, resulting in the fused kNDVI sequence still containing fluctuations in low values. When processing such highly heterogeneous landscapes, the STARFM model in Figure 5e blurs the geometric contours of fragmented land cover types due to its inherent low-pass filtering characteristics. Meanwhile, the FSDAF model in Figure 5f generates local artefacts in areas sensitive to water level fluctuations; this spectral instability directly interferes with the smoothness of the reconstructed curves, thereby undermining the robustness of phenological node extraction.

To quantitatively support the above local visual comparisons, regional accuracy statistics were further calculated for Regions 1 and 2 using the valid Sentinel-2 kNDVI image as the reference. For each region, RMSE, bias, and structural similarity index measure (SSIM) were calculated after excluding cloud-masked or invalid pixels. Bias was defined as the mean difference between the fused and reference kNDVI values, with positive and negative values indicating overestimation and underestimation, respectively. As shown in Table 3, IESTARFM achieved the lowest RMSE and the highest SSIM in both regions, indicating that it better preserved local spatial structure and reduced regional prediction errors compared with ESTARFM, STARFM, and FSDAF.

To isolate the contribution of each methodological component, we further conducted a leave-one-component-out ablation experiment under the same validation framework. All variants used the same MODIS and Sentinel-2 inputs, validation samples, preprocessing procedures, and parameter settings; only one component was removed each time. Specifically, the tested variants included IESTARFM without the adaptive matching window, without cloud-probability weighting, without temporal-distance weighting, and without the quadratic correction term. As shown in Table 4, the full IESTARFM model achieved the best overall performance, whereas removing any of the four components resulted in decreased accuracy, confirming that each component contributes to the robustness and accuracy improvement of the proposed fusion framework.

3.2. Performance of the IESTARFM Algorithm in Wetland Vegetation Phenology Retrieval

In this study, HANTS combined with the Dynamic Threshold Method is applied to the reconstructed high spatiotemporal resolution kNDVI dataset over Poyang Lake to retrieve and validate the phenological characteristics of wetland vegetation, focusing on Cs and Pa. The kNDVI based phenology maps for Cs shown in Figure 6 clearly reveal the spatial distribution of SOS and EOS in terms of day of year (DOY) across the study area. Most Cs pixels exhibit a first growing season SOS between DOY 50 and 75, corresponding to late February to mid-March 2024, and a first season EOS between DOY 160 and 180, that is, from early June to late June 2024. The second growing season of Cs mainly starts between DOY 200 and 210, corresponding to late July, while the second season EOS is concentrated between DOY 330 and 365, that is from early November to late December.

Figure 7 shows the kNDVI based phenology retrieval results for Pa. Most Pa pixels exhibit a growing season starting between DOY 90 and 100, corresponding to late March to early April 2024, while the end of the growing season is mainly distributed between DOY 330 and 340, that is from early to mid-November 2024. This level of spatial detail highlights the advantage of the proposed algorithm in dealing with land cover heterogeneity. At the same time, the kNDVI time series reconstructed by the improved fusion framework successfully captures the growth dynamics of wetland vegetation and identifies key phenological transition points, indicating that the Dynamic Threshold Method provides good accuracy for phenology retrieval.

Figure 8 and Figure 9 provide scatter-plot validation between remotely sensed phenological dates and in-situ observations for Cs and Pa, respectively. Each panel reports the number of matched samples (N),

R^{2}

, and RMSE for the corresponding species and phenophase, and includes both the fitted regression line and the 1:1 reference line to illustrate agreement and potential bias. For Cs, 30 matched samples were used for SOS and EOS validation in both the first and second growing seasons. For Pa, 30 matched samples were used for SOS and EOS validation of the annual growing season. These panel-wise validation results provide a direct assessment of the temporal consistency of the 2024 annual phenology retrieval across spatially distributed field sites in the Poyang Lake wetland.

To quantitatively evaluate the reliability of the kNDVI vegetation index for wetland phenology retrieval, in situ observations collected in the Poyang Lake region in 2024 were used as references. Phenological parameters SOS and EOS derived from the kNDVI-based retrieval were compared with these field data, showing strong agreement: for Cs, the validation achieved a maximum

R^{2}

of 0.83, and for Pa,

R^{2}

reached 0.86, indicating that the kNDVI dataset reconstructed via the IESTARFM fusion method reliably captures temporal dynamics. Scatter plots in Figure 8 and Figure 9 illustrate the correspondence between remotely sensed and in situ phenological dates, with each panel reporting the number of matched samples (N),

R^{2}

, and RMSE, and including both the fitted regression line and the 1:1 reference line to visualize agreement and potential bias. Specifically, 30 matched samples were used for SOS and EOS validation of Cs across both growing seasons, and 30 samples for Pa across the annual growing season, providing a comprehensive assessment of the 2024 phenology retrieval’s temporal consistency across spatially distributed field sites.

The observed differences in phenology retrieval accuracy between Cs and Pa are closely related to the hydrological characteristics of the Poyang Lake wetland. Specifically, Pa mainly grows in relatively stable waterside zones, whereas Cs expands toward the lake center as water levels recede and is more easily submerged during the flood season. Consequently, the phenological signals of Cs are more strongly affected by hydrological fluctuations and water-background mixing, which may increase the uncertainty of EOS retrieval across different growing seasons.

4. Discussion

4.1. Error Mechanisms and Improvements of Multi Source Remote Sensing Image Fusion in Wetland Environments

Lake wetlands are characterized by pronounced land–water interlacing, fragmented landscapes, and rapid state transitions, which make the same-type pixel assumption difficult to satisfy in space and increase the likelihood that abrupt changes and observation gaps occur simultaneously in time. As a result, weight based spatiotemporal fusion tends to generate blocky artifacts, texture fragmentation, and spectral bias along boundaries and in change hotspots. Recent reviews of spatiotemporal fusion generally point out that, under heterogeneous landscapes and rapidly changing surfaces, the main bottlenecks of traditional methods lie in the reliability of similar pixel matching, the limited representativeness of observations during change periods, and the lack of effective control over error propagation; the absence of unified benchmark datasets and standardized evaluation frameworks further limits method comparability and generalization [22,41]. Against this background, the reconstructed series in this study achieve (

R^{2}

= 0.875) and RMSE = 0.066 on the validation set, while maintaining better spatial continuity and detail fidelity in cloud covered and shadowed areas, indicating that the proposed improvements help suppress error accumulation under strongly heterogeneous and cloudy wetland conditions. Compared with weight based frameworks represented by STARFM and ESTARFM, which are constrained in handling abrupt changes and mixed pixels, FSDAF enhances adaptability to heterogeneous landscapes and sudden changes through spectral unmixing and spatial interpolation. However, subsequent studies still highlight room for improvement in terms of handling complex changes and robustness, and methods such as FSDAF2.0 explicitly focus on improving change detection and stability [42,43]. These findings suggest that enhancing fusion performance in wetland scenarios generally requires concurrent strengthening of three aspects: sample availability, quality constraints, and change representation. In this study, an adaptive matching window is used to alleviate the shortage of valid samples under cloud cover and fragmented landscapes, and cloud probability and temporal distance are explicitly incorporated into the weight allocation to reduce the impact of unreliable observations. This is consistent with the existing emphasis on mitigating error propagation through quality control and spatiotemporal constraints [44]. In recent years, deep learning-based fusion and reconstruction methods have developed rapidly, especially optical–SAR fusion frameworks designed to cope with persistent cloud cover, which show considerable potential for vegetation index time-series reconstruction in cloudy regions. Compared with these state-of-the-art deep learning approaches, IESTARFM has the advantages of clearer physical interpretability, lower dependence on large training datasets, simpler parameterization, and easier implementation in operational monitoring workflows. These advantages are particularly relevant in wetland regions where high-quality training samples and multi-year reference datasets are often limited. However, deep learning models are generally more powerful in learning complex nonlinear and cross-sensor relationships, and optical–SAR fusion models can make use of all-weather SAR observations to compensate for missing optical data under persistent cloud cover. Therefore, IESTARFM should be regarded as a robust and controllable fusion framework for pre-filtered optical time series, rather than a replacement for deep learning or optical–SAR fusion methods under all conditions. Incorporating Sentinel-1 SAR observations is a planned direction for future work, because SAR can provide complementary structural and hydrological information under cloudy conditions. Nevertheless, the integration of Sentinel-1 data requires careful treatment of optical–SAR radiometric differences, scattering mechanisms, speckle noise, and vegetation–water interaction signals in wetlands.

From the perspective of computational cost and scalability, IESTARFM was implemented on the Google Earth Engine platform using server-side preprocessing, cloud masking, 8-day compositing, kNDVI calculation, similar-pixel searching, pixel-wise weight calculation, and image export. The main computational cost comes from the adaptive-window search and pixel-wise weight calculation, whereas cloud masking, temporal compositing, and kNDVI calculation are relatively lightweight operations in GEE. In our implementation, reconstructing one target-date image generally required approximately 8–13 min under the current study-area extent and export settings. This runtime should be interpreted as an empirical reference rather than a fixed algorithmic constant, because GEE execution time can be affected by server load, export scale, region size, and task scheduling. In practical applications, the reconstruction can be organized by 8-day time steps and spatial tiles, making the method feasible for regional-scale wetland monitoring without requiring local GPU training or large local storage. For larger areas or multi-year applications, tiled processing and batch export are recommended to maintain scalability.

4.2. Limitations and Future Research

Although the fusion strategy improves temporal continuity and spatial detail, the remaining uncertainties can be broadly attributed to both algorithm-related and data-related limitations. From the algorithmic perspective, the resampling of MODIS observations from 500 m to the Sentinel-2 10 m grid was used only for spatial registration and pixel-wise fusion implementation, but it may still introduce local smoothing effects and mixed-pixel uncertainty, particularly along fragmented land–water boundaries. In addition, the quadratic correction term assumes that similar pixels within a local window share a comparable nonlinear relationship between coarse- and fine-resolution kNDVI changes; this assumption improves the representation of nonlinear vegetation growth but may be less effective under abrupt hydrological disturbances or rapid land-cover transitions. From the data perspective, residual radiometric inconsistencies between Sentinel-2 and MODIS, possible geometric co-registration errors, residual thin clouds or shadows, and the limited availability of valid Sentinel-2 observations may also contribute to reconstruction uncertainty. The overall validation result, with

R^{2}

= 0.875 and RMSE = 0.066, together with the regional comparison results in Table 3, suggests that these uncertainties were reduced but not completely eliminated. Therefore, the remaining errors should be interpreted as the combined effect of algorithmic assumptions and input data quality. The second growing season of Cs coincides with the dry-season hydrological transition period in the Poyang Lake wetland. During this stage, rapid environmental changes associated with seasonal water-level fluctuations and exposed mudflats may interfere with the kNDVI signal, thereby contributing to the observed differences in EOS retrieval accuracy.

IESTARFM improves robustness by incorporating cloud-probability and temporal-distance weighting, but its performance still depends on the availability of sufficient valid fine-resolution observations. Under extremely persistent cloud cover, where valid Sentinel-2 observations are insufficient to constrain fine-scale spatial structure, reconstruction uncertainty may increase. Future studies using longer multi-year time series or additional sensors could further quantify fusion accuracy across a wider cloudiness gradient.

Although the proposed IESTARFM framework was evaluated using the 2024 Poyang Lake wetland dataset, the methodological framework itself is not limited to a single year or site. The main processing logic, including cloud masking, spatiotemporal fusion, kNDVI reconstruction, and phenological parameter extraction, can be transferred to other years or similar wetland systems. However, some parameter settings may be affected by the quality of the original remote sensing data, such as cloud contamination, the number of valid Sentinel-2 observations, temporal gaps, and local landscape heterogeneity. Therefore, when applying the method to other years or wetland regions, minor parameter adjustment or local validation may still be necessary.

Future work will focus on optimizing the preprocessing steps prior to fusion, including the implementation of more consistent cloud masking, stricter multi sensor geometric co registration, and enhanced radiometric normalization methods, in order to improve robustness across regions. In addition, the fusion framework will be extended by integrating complementary satellite observations, such as other optical sensors or radar data, to further mitigate the impact of persistent cloud cover and enhance the applicability of the method in cloudy and rainy regions.

5. Conclusions

In response to the challenges posed by scarce valid observations under cloudy and rainy conditions, highly fragmented wetland surfaces, and pronounced nonlinear responses of vegetation indices, this study proposes a spatiotemporal fusion algorithm for remote sensing imagery, IESTARFM, tailored to wetland vegetation phenology retrieval in cloudy and rainy regions. Using the Poyang Lake wetland in 2024 as a case study, the main conclusions are summarized as follows:

(1) At the model level, IESTARFM targets radiometric bias and noise propagation caused by mixed pixels in land–water ecotones and thin cloud contamination. An adaptive matching window and a joint weighting scheme based on temporal distance and cloud probability are constructed to improve the homogeneity of similar pixel selection and the reliability of observations. In addition, a quadratic polynomial correction is introduced into the fusion relationship to represent the nonlinear growth response of kNDVI. These modifications alleviate the systematic prediction bias of traditional linear fusion in highly dynamic periods and enhance both detail fidelity and temporal continuity of the reconstructed time series.

(2) At the application level, the high spatiotemporal continuity kNDVI sequences generated by IESTARFM can stably describe the growth processes of Poyang Lake wetland vegetation and support phenological parameter retrieval. The validation results show that the agreement between the fused images and reference observations is substantially improved, with

R^{2}

= 0.875 and RMSE = 0.066 for the validation set. The retrieved SOS and EOS also exhibited high temporal consistency with in situ measurements, with

R^{2}

values up to 0.86 and RMSE values reported for each species and phenophase in Figure 8 and Figure 9. These findings indicate that the proposed method has good robustness in suppressing thin cloud noise, reducing water background interference, and mitigating index saturation in high biomass areas, and can provide reliable technical support for high accuracy and continuous monitoring of wetland vegetation phenology in cloudy and rainy regions.

Author Contributions

Conceptualization, T.X. and J.A.; methodology, T.X. and N.X.; software, T.X. and M.Q.; validation, J.A.; formal analysis, T.X.; investigation, T.X.; resources, T.X.; data curation, T.X.; writing—original draft preparation, T.X.; writing—review and editing, J.A. and T.X.; visualization, T.X.; supervision, T.X.; project administration, T.X.; funding acquisition, J.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Jiangxi Provincial Natural Science Foundation [20242BAB25174], National Natural Science Foundation of China (42361059).

Data Availability Statement

The MODIS MOD09A1 8-day composite (Collection 6.1) and Harmonized Sentinel-2 Level-2A image data that support the findings of this study are available at MODIS: [https://developers.google.cn/earth-engine/datasets/catalog/MODIS_061_MOD09A1?hl=zh-cn], accessed on 20 April 2026 and Sentinel-2: [https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED], accessed on 20 April 2026.

Acknowledgments

We would like to express our sincere gratitude to Yongbin Tan for their valuable suggestions during the writing of this paper. We also acknowledge the assistance provided by graduate students Xintao Tang, Xinxing Han, Jiangtao Zhu, and Chunmei Niu in collecting field samples.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ji, Y.; Zhan, W.; Du, H.; Wang, S.; Li, L.; Xiao, J.; Liu, Z.; Huang, F.; Jin, J. Urban-rural gradient in vegetation phenology changes of over 1500 cities across China jointly regulated by urbanization and climate change. ISPRS J. Photogramm. Remote Sens. 2023, 205, 367–384. [Google Scholar] [CrossRef]
Gui, H.; Xin, Q.; Zhou, X.; Wu, W.; Xiong, Z. Better representation of vegetation phenology improves estimations of annual gross primary productivity. Ecol. Inform. 2024, 82, 102767. [Google Scholar] [CrossRef]
Verhegghen, A.; Bontemps, S.; Defourny, P. A global NDVI and EVI reference data set for land-surface phenology using 13 years of daily SPOT-VEGETATION observations. Int. J. Remote Sens. 2014, 35, 2440–2471. [Google Scholar] [CrossRef]
Richardson, A.D. PhenoCam: An evolving, open-source tool to study the temporal and spatial variability of ecosystem-scale phenology. Agric. For. Meteorol. 2023, 342, 109751. [Google Scholar] [CrossRef]
Baldocchi, D. Measuring fluxes of trace gases and energy between ecosystems and the atmosphere–the state and future of the eddy covariance method. Glob. Change Biol. 2014, 20, 3600–3609. [Google Scholar] [CrossRef]
Zhao, L.; Guo, W.; Wang, J.; Wang, H.; Duan, Y.; Wang, C.; Wu, W.; Shi, Y. An efficient method for estimating wheat heading dates using UAV images. Remote Sens. 2021, 13, 3067. [Google Scholar] [CrossRef]
Gao, F.; Zhang, X. Mapping crop phenology in near real-time using satellite remote sensing: Challenges and opportunities. J. Remote Sens. 2021, 2021, 8379391. [Google Scholar] [CrossRef]
Bolton, D.K.; Gray, J.M.; Melaas, E.K.; Moon, M.; Eklundh, L.; Friedl, M.A. Continental-scale land surface phenology from harmonized Landsat 8 and Sentinel-2 imagery. Remote Sens. Environ. 2020, 240, 111685. [Google Scholar] [CrossRef]
Taddeo, S.; Dronova, I.; Depsky, N. Spectral vegetation indices of wetland greenness: Responses to vegetation structure, composition, and spatial distribution. Remote Sens. Environ. 2019, 234, 111467. [Google Scholar] [CrossRef]
Gong, Z.; Ge, W.; Guo, J.; Liu, J. Satellite remote sensing of vegetation phenology: Progress, challenges, and opportunities. ISPRS J. Photogramm. Remote Sens. 2024, 217, 149–164. [Google Scholar] [CrossRef]
Mahdianpari, M.; Brisco, B.; Granger, J.E.; Mohammadimanesh, F.; Salehi, B.; Banks, S.; Homayouni, S.; Bourgeau-Chavez, L.; Weng, Q. The second generation Canadian wetland inventory map at 10 meters resolution using Google Earth Engine. Can. J. Remote Sens. 2020, 46, 360–375. [Google Scholar] [CrossRef]
Dong, C.; Yang, G.; Wang, Y.; Sun, W.; Meng, X.; Chen, B. Integrating multi-temporal sar and optical information for missing optical imagery generation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar]
Rezaee, M.; Mahdianpari, M.; Zhang, Y.; Salehi, B. Deep convolutional neural network for complex wetland classification using optical remote sensing imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3030–3039. [Google Scholar] [CrossRef]
Ngwenya, K.; Marambanyika, T. Trends in use of remotely sensed data in wetlands assessment and monitoring in Zimbabwe. Afr. J. Ecol. 2021, 59, 676–686. [Google Scholar] [CrossRef]
Wen, L.; Mason, T.J.; Ryan, S.; Ling, J.E.; Saintilan, N.; Rodriguez, J. Monitoring long-term vegetation condition dynamics in persistent semi-arid wetland communities using time series of Landsat data. Sci. Total Environ. 2023, 905, 167212. [Google Scholar] [CrossRef]
Kayastha, N.; Thomas, V.; Galbraith, J.; Banskota, A. Monitoring wetland change using inter-annual landsat time-series data. Wetlands 2012, 32, 1149–1162. [Google Scholar] [CrossRef]
Zhang, X.; Xie, L.; Li, S.; Lei, F.; Cao, L.; Li, X. Wuhan dataset: A high resolution dataset of spatiotemporal fusion for remote sensing images. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Li, Y.; Cai, R.; Li, J.; Liu, Z.; Meng, L.; He, L. Pansharpening-based spatio-temporal fusion for predicting intense surface changes. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
Ding, Y.; Liu, Y.; Gu, X.; Guo, H.; Zhan, Y.; Li, J.; Zhang, Q.; Yang, Y.; Shi, J. IT-STF: A Fast and Imperfect Input Tolerant Spatio-Temporal Fusion Method. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 22847–22863. [Google Scholar] [CrossRef]
Gao, F.; Masek, J.; Schwaller, M.; Hall, F. On the blending of the Landsat and MODIS surface reflectance: Predicting daily Landsat surface reflectance. IEEE Trans. Geosci. Remote Sens. 2006, 44, 2207–2218. [Google Scholar]
Zhu, X.; Chen, J.; Gao, F.; Chen, X.; Masek, J.G. An enhanced spatial and temporal adaptive reflectance fusion model for complex heterogeneous regions. Remote Sens. Environ. 2010, 114, 2610–2623. [Google Scholar] [CrossRef]
Zhu, X.; Helmer, E.H.; Gao, F.; Liu, D.; Chen, J.; Lefsky, M. A flexible spatiotemporal method for fusing satellite images with different resolutions. Remote Sens. Environ. 2016, 172, 165–177. [Google Scholar] [CrossRef]
Xu, Y.; Li, X.; Du, H.; Mao, F.; Zhou, G.; Huang, Z.; Fan, W.; Chen, Q.; Ni, C.; Guo, K. Improving extraction phenology accuracy using SIF coupled with the vegetation index and mapping the spatiotemporal pattern of bamboo forest phenology. Remote Sens. Environ. 2023, 297, 113785. [Google Scholar] [CrossRef]
Yin, G.; Verger, A.; Descals, A.; Filella, I.; Peñuelas, J. A broadband green-red vegetation index for monitoring gross primary production phenology. J. Remote Sens. 2022, 2022, 9764982. [Google Scholar] [CrossRef]
Houborg, R.; Soegaard, H.; Boegh, E. Combining vegetation index and model inversion methods for the extraction of key vegetation biophysical parameters using Terra and Aqua MODIS reflectance data. Remote Sens. Environ. 2007, 106, 39–58. [Google Scholar] [CrossRef]
Luo, J.; Wu, X.; Gao, Y.; Cai, Y.; Yang, L.; Xiong, Y.; Yang, Q.; Liu, J.; Li, Y.; Deng, Z.; et al. Spatiotemporal Variations and Seasonal Climatic Driving Factors of Stable Vegetation Phenology Across China over the Past Two Decades. Remote Sens. 2025, 17, 3467. [Google Scholar] [CrossRef]
Liu, H.; Huete, A. A feedback based modification of the NDVI to minimize canopy background and atmospheric noise. IEEE Trans. Geosci. Remote Sens. 1995, 33, 457–465. [Google Scholar] [CrossRef]
Camps-Valls, G.; Campos-Taberner, M.; Moreno-Martinez, A.; Walther, S.; Duveiller, G.; Cescatti, A.; Mahecha, M.; Muñoz-Marí, J.; García-Haro, F.; Guanter, L.; et al. Generalization of Vegetation Indices for Monitoring the Terrestrial Biosphere. In Proceedings of the 23rd EGU General Assembly, online, 19–30 April 2021. Copernicus Meetings. [Google Scholar]
Gupta, A.; Lanka, K. Impact of Climate trends on Ecosystem productivity. In Proceedings of the IGARSS 2025-2025 IEEE International Geoscience and Remote Sensing Symposium, Brisbane, Australia, 3–8 August 2025; IEEE: New York, NY, USA, 2025; pp. 1472–1478. [Google Scholar]
Liu, J.; Zou, H.; Zhao, Y.; Wang, X.; Zhen, Z. Unraveling Phenological Dynamics: Exploring Early Springs, Late Autumns, and Climate Drivers Across Different Vegetation Types in Northeast China. Remote Sens. 2025, 17, 1853. [Google Scholar] [CrossRef]
Wang, Q.; Moreno-Martínez, Á.; Muñoz-Marí, J.; Campos-Taberner, M.; Camps-Valls, G. Estimation of vegetation traits with kernel NDVI. ISPRS J. Photogramm. Remote Sens. 2023, 195, 408–417. [Google Scholar] [CrossRef]
Chen, Y.; Yi, G.; Zhu, X.; Zhou, X.; Zhang, T.; Yang, B.; Li, J.; Liu, X.; Ma, X. Quantifying the vegetation restoration mechanism of ecological projects on the Qingzang Plateau. Int. J. Digit. Earth 2025, 18, 2537323. [Google Scholar] [CrossRef]
Dong, F.; Qin, F.; Zhang, T.; Dong, X.; Wu, Y.; Guo, Z. kNDVI reveals vegetation dynamics and hydro–edaphic controls in inner Mongolia (2000–2024). Sci. Rep. 2026, 16, 5244. [Google Scholar] [CrossRef]
Wu, Z.; Yao, F.; Ahmad, A.; Deng, F.; Fang, J. Spatiotemporal Evolution and Driving Mechanisms of kNDVI in Different Sections of the Yangtze River Basin Using Multiple Statistical Methods and the PLSPM Model. Remote Sens. 2025, 17, 299. [Google Scholar] [CrossRef]
Wang, C.; Liu, L.; Zhou, Y.; Liu, X.; Wu, J.; Tan, W.; Xu, C.; Xiong, X. Comparison between satellite derived solar-induced chlorophyll fluorescence, NDVI and kNDVI in detecting water stress for dense vegetation across southern China. Remote Sens. 2024, 16, 1735. [Google Scholar] [CrossRef]
Ji, W.; Ge, G.; Li, H. Poyang Lake: Topography, Hydrology, and Vegetation; Science Press: Beijing, China, 2017. [Google Scholar]
Pan, Y.; Huang, M.; Chen, Y.; Chen, B.; Ma, L.; Zhao, W.; Fu, D. A 36-Year Assessment of Mangrove Ecosystem Dynamics in China Using Kernel-Based Vegetation Index. Forests 2025, 16, 1143. [Google Scholar] [CrossRef]
Jonsson, P.; Eklundh, L. Seasonality extraction by function fitting to time-series of satellite sensor data. IEEE Trans. Geosci. Remote Sens. 2002, 40, 1824–1832. [Google Scholar] [CrossRef]
Song, J.; Zhang, C.; Han, J.C. Retrieving high spatiotemporal-resolution phenology based on GEE and multi-source remote sensing data fusion. J. Remote Sens. 2024, 28, 2910–2926. [Google Scholar]
Sun, E.; Cui, Y.; Liu, P.; Yan, J. A decade of deep learning for remote sensing spatiotemporal fusion: Advances, challenges, and opportunities. Inf. Fusion 2025, 126, 103675. [Google Scholar] [CrossRef]
Li, J.; Li, Y.; He, L.; Chen, J.; Plaza, A. Spatio-temporal fusion for remote sensing data: An overview and new benchmark. Sci. China Inf. Sci. 2020, 63, 140301. [Google Scholar] [CrossRef]
Guo, D.; Shi, W.; Hao, M.; Zhu, X. FSDAF 2.0: Improving the performance of retrieving land cover changes and preserving spatial details. Remote Sens. Environ. 2020, 248, 111973. [Google Scholar] [CrossRef]
Chu, D.; Shen, H.; Guan, X.; Chen, J.M.; Li, X.; Li, J.; Zhang, L. Long time-series NDVI reconstruction in cloud-prone regions via spatio-temporal tensor completion. Remote Sens. Environ. 2021, 264, 112632. [Google Scholar] [CrossRef]
Li, J.; Li, C.; Xu, W.; Feng, H.; Zhao, F.; Long, H.; Meng, Y.; Chen, W.; Yang, H.; Yang, G. Fusion of optical and SAR images based on deep learning to reconstruct vegetation NDVI time series in cloud-prone regions. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102818. [Google Scholar] [CrossRef]

Figure 1. Location of the study area and field survey sites. (a) Geographic location of the study area at the provincial scale. (b) Sentinel-2 L2A image showing the field survey region and sampling sites.

Figure 2. kNDVI phenological curves. (a) Carex spp. (Cs) phenological curve. (b) P. australis (Pa) phenological curve.

Figure 3. Scatter plots of predicted kNDVI versus Sentinel-2 observations for different fusion methods. (a) Proposed fusion method. (b) ESTARFM fusion method. (c) STARFM fusion method. (d) FSDAF fusion method.

Figure 4. Comparison of the original imagery with kNDVI fusion images generated using ESTARFM, STARFM, FSDAF, and the method described in this paper—Region 1. (a) Sentinel-2 reference image. (b) MODIS reference image. (c) Image generated by the proposed fusion algorithm. (d) Image generated by the ESTARFM fusion method. (e) Image generated by the STARFM fusion method. (f) Image generated by the FSDAF fusion method.

Figure 5. Comparison of the original imagery with kNDVI fusion images generated using ESTARFM, STARFM, FSDAF, and the method described in this paper—Region 2. (a) Sentinel-2 reference image. (b) MODIS reference image. (c) Image generated by the proposed fusion algorithm. (d) Image generated by the ESTARFM fusion method. (e) Image generated by the STARFM fusion method. (f) Image generated by the FSDAF fusion method.

Figure 6. kNDVI based phenology retrieval results for Cs. (a) SOS of the first growing season. (b) EOS of the first growing season. (c) SOS of the second growing season. (d) EOS of the second growing season.

Figure 7. kNDVI based phenology retrieval results for Pa. (a) Start of the growing season. (b) End of the growing season.

Figure 8. Accuracy of Cs phenology derived from kNDVI retrieval compared with in situ observations. The figure shows the coefficients of determination

R^{2}

for specific phenological dates obtained by cross validation. The solid red line represents the fitted regression line, and the black dashed line represents the 1:1 reference line. (a) Validation of SOS for the first growing season. (b) Validation of EOS for the first growing season. (c) Validation of SOS for the second growing season. (d) Validation of EOS for the second growing season.

Figure 8. Accuracy of Cs phenology derived from kNDVI retrieval compared with in situ observations. The figure shows the coefficients of determination

R^{2}

for specific phenological dates obtained by cross validation. The solid red line represents the fitted regression line, and the black dashed line represents the 1:1 reference line. (a) Validation of SOS for the first growing season. (b) Validation of EOS for the first growing season. (c) Validation of SOS for the second growing season. (d) Validation of EOS for the second growing season.

Figure 9. Accuracy of Pa phenology derived from kNDVI retrieval compared with in situ observations. The figure shows the coefficients of determination (R²) for specific phenological dates obtained by cross validation. The solid red line represents the fitted regression line, and the black dashed line represents the 1:1 reference line. (a) Validation of SOS. (b) Validation of EOS.

Table 1. Field observation data on the phenology of Carex spp. Communities.

Observation Point	Longitude	Latitude	SOS-1 (DOY)	EOS-1 (DOY)	SOS-2 (DOY)	EOS-2 (DOY)
1	116°21′48.45″E	28°57′58.98″N	56	179	211	318
2	116°21′29.23″E	28°56′54.58″N	54	182	198	365
3	116°20′39.72″E	28°56′00.19″N	52	183	196	330
4	116°21′31.30″E	28°58′31.88″N	58	177	209	330
5	116°21′27.71″E	28°58′40.91″N	58	177	200	325
6	116°21′37.90″E	28°57′51.68″N	56	173	217	365
7	116°15′41.21″E	28°54′25.51″N	58	171	208	318
8	116°20′44.20″E	28°56′03.17″N	52	176	201	306
9	116°21′40.10″E	28°58′29.84″N	55	176	187	350
10	116°15′51.82″E	28°54′30.00″N	58	173	223	328

Table 2. Field observation data on the phenology of P. australis Communities.

Observation Point	Longitude	Latitude	SOS (DOY)	EOS (DOY)
1	116°21′47.12″E	28°57′52.73″N	95	337
2	116°21′45.36″E	28°57′49.58″N	88	330
3	116°21′38.48″E	28°57′9.31″N	88	344
4	116°21′34.97″E	28°56′59.89″N	90	329
5	116°21′30.84″E	28°56′55.16″N	90	341
6	116°21′22.13″E	28°56′21.67″N	86	326
7	116°21′12.09″E	28°56′13.26″N	87	333
8	116°21′6.77″E	28°56′8.54″N	92	345
9	116°20′32.88″E	28°56′12.55″N	92	332
10	116°20′24.03″E	28°56′3.10″N	89	339

Table 3. Regional quantitative comparison of fusion accuracy for Regions 1 and 2.

Region	Method	RMSE	Bias	SSIM
Region 1	IESTARFM	0.058	−0.004	0.902
Region 1	ESTARFM	0.071	−0.012	0.835
Region 1	STARFM	0.084	−0.020	0.776
Region 1	FSDAF	0.077	0.015	0.811
Region 2	IESTARFM	0.066	−0.006	0.872
Region 2	ESTARFM	0.080	−0.017	0.791
Region 2	STARFM	0.092	−0.025	0.718
Region 2	FSDAF	0.086	0.019	0.754

Table 4. Ablation analysis of the main components of IESTARFM.

Model Variant	Adaptive Window	Cloud Weighting	Temporal Weighting	Quadratic Correction	R²	RMSE	Bias	SSIM
Full IESTARFM	✓	✓	✓	✓	0.875	0.066	0.003	0.889
w/o adaptive window	–	✓	✓	✓	0.852	0.069	−0.006	0.861
w/o cloud weighting	✓	–	✓	✓	0.839	0.070	−0.011	0.847
w/o temporal weighting	✓	✓	–	✓	0.863	0.068	−0.004	0.873
w/o quadratic correction	✓	✓	✓	–	0.849	0.069	−0.008	0.858

Note: ✓ indicates participation in the module.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xie, T.; Ai, J.; Xie, N.; Qiao, M. A Robust Spatiotemporal Fusion Algorithm for Wetland Vegetation Phenology Retrieval in Cloud-Prone Regions. Remote Sens. 2026, 18, 1832. https://doi.org/10.3390/rs18111832

AMA Style

Xie T, Ai J, Xie N, Qiao M. A Robust Spatiotemporal Fusion Algorithm for Wetland Vegetation Phenology Retrieval in Cloud-Prone Regions. Remote Sensing. 2026; 18(11):1832. https://doi.org/10.3390/rs18111832

Chicago/Turabian Style

Xie, Tianci, Jinquan Ai, Ni Xie, and Man Qiao. 2026. "A Robust Spatiotemporal Fusion Algorithm for Wetland Vegetation Phenology Retrieval in Cloud-Prone Regions" Remote Sensing 18, no. 11: 1832. https://doi.org/10.3390/rs18111832

APA Style

Xie, T., Ai, J., Xie, N., & Qiao, M. (2026). A Robust Spatiotemporal Fusion Algorithm for Wetland Vegetation Phenology Retrieval in Cloud-Prone Regions. Remote Sensing, 18(11), 1832. https://doi.org/10.3390/rs18111832

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Robust Spatiotemporal Fusion Algorithm for Wetland Vegetation Phenology Retrieval in Cloud-Prone Regions

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Source of Data

2.3. Proposed Algorithm for Spatio-Temporal Fusion of Remote Sensing Images

2.3.1. Adaptive Matching Window

2.3.2. Cloud-Probability and Temporal-Distance Weighting Factors

2.3.3. Method for Constructing kNDVI Time Series

2.4. Method for Wetland Vegetation Phenology Retrieval

2.4.1. Time Series Reconstruction Based on HANTS

2.4.2. Extraction of Wetland Vegetation Phenological Parameters

2.5. In-Situ Phenological Observations and Validation Sample Design

2.6. Accuracy Assessment

3. Results

3.1. IESTARFM’s Capabilities in Removing Thin Clouds and Its Spatio-Temporal Fusion Reconstruction Performance

3.2. Performance of the IESTARFM Algorithm in Wetland Vegetation Phenology Retrieval

4. Discussion

4.1. Error Mechanisms and Improvements of Multi Source Remote Sensing Image Fusion in Wetland Environments

4.2. Limitations and Future Research

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI