3.1. Effect of Input Feature Combinations on Model Performance
To evaluate the influence of different input feature combinations on the model performance, a systematic assessment was conducted based on the optimized DMNST model. The input features were derived from ten spectral bands (B2–B12) of Sentinel-2 imagery and grouped into four primary spectral categories: visible (VIS; B2–B4), red-edge (RE; B5–B7), near-infrared (NIR; B8, B8A), and shortwave infrared (SWIR; B11, B12).
In addition, two categories of vegetation indices were incorporated into the feature space: (1) two basic vegetation indices (2BVIs), including the NDVI and EVI; and (2) a comprehensive set of 23 commonly used vegetation indices (23CVIs), such as the TNDVI, GNDVI, and IRECI. In total, 48 distinct feature combinations were constructed for the model evaluation.
The predictive performance associated with each feature combination is quantitatively summarized in
Table 3. Moreover, the coefficient of determination (R
2) on the test set was employed to visualize and compare the model performance across all the combinations (
Figure 5), providing an intuitive understanding of the relative predictive strength of each input feature set.
3.1.1. Overall Performance Analysis of Feature Combinations
The feature combination that integrated all four spectral domains—visible (VIS), red-edge (RE), near-infrared (NIR), and shortwave infrared (SWIR)—achieved the best overall performance, yielding the highest R
2 value of 0.770 on the test dataset (
Figure 5). This result highlights the effectiveness of spectral fusion in capturing the comprehensive spectral responses of NWs, including the pigment absorption, canopy structural characteristics, and moisture content. The integration of diverse spectral information substantially enhanced the model’s estimation accuracy, underscoring the importance of multi-domain spectral inputs in improving the predictive performance for fractional cover regression tasks.
In contrast, the feature combinations derived from a single spectral domain exhibited substantially lower predictive performance. The models using only the VIS, RE, NIR, or SWIR bands produced test R2 values below 0.6, indicating that isolated spectral inputs are insufficient to fully capture the complex physiological and structural characteristics of NWs. In comparison, multi-band fusion provides a more holistic spectral representation, leading to improved model accuracy.
A limited subset of vegetation indices also yielded moderate performance. For instance, the feature combination of NIR + VIS + 2BVIs achieved an R2 of 0.613, which was comparable to the result obtained using the full set of 23CVIs (R2 = 0.615). This finding suggests that incorporating an excessive number of indices may introduce redundancy, increase the model complexity, and elevate the risk of overfitting.
Further analysis indicated that the vegetation indices primarily perform a complementary role. When used in isolation, the 2BVIs and 23CVIs achieved R2 values of 0.567 and 0.645, respectively. While these indices enhanced the model’s sensitivity to phenological variation, they were consistently outperformed by combinations that incorporated raw spectral bands. This highlights that the vegetation indices alone lack the spectral diversity required for high-precision regression tasks.
Notably, the SWIR bands contributed meaningfully to the performance gains. Combinations such as SWIR + VIS and SWIR + NIR consistently achieved R2 values above 0.62, underscoring the importance of the SWIR in capturing the vegetation water content, biophysical traits, and stress-related signals.
In summary, integrating the VIS, RE, NIR, and SWIR spectral information—supplemented with key vegetation indices—substantially improves the DMNST model’s performance in estimating the fractional cover of NWs. These findings underscore the value of spectral fusion strategies for spatiotemporal monitoring of NWs’ dynamics and their ecological implications.
3.1.2. Comparative Analysis of Representative Feature Combinations
To further investigate the influence of different input feature combinations on the model performance, four representative configurations were selected, each corresponding to a distinct feature engineering strategy: multispectral fusion, index-constrained selection, high-dimensional stacking, and partial-spectrum fusion. The predictive effectiveness of these configurations was assessed using scatter plots comparing the estimated and observed fractional coverage of NWs (
Figure 6), providing a visual basis for the performance comparison.
The configuration that combined the VIS, RE, NIR, and SWIR bands demonstrated the highest accuracy, achieving an R
2 of 0.770 and an RMSE of 0.096 on the test set. As illustrated in
Figure 6a, the predicted values were well aligned with the 1:1 reference line, particularly within the low-to-moderate coverage range (0.2–0.5), indicating that integrating full-spectrum information effectively captured the spectral responses of NWs across multiple biophysical dimensions.
The second configuration combined the RE and SWIR bands with two representative biophysical vegetation indices (2BVIs) while excluding the NIR domain. As shown in
Figure 6b, this setup yielded slightly reduced predictive performance (R
2 = 0.711, RMSE = 0.107) compared to the full-spectrum fusion strategy. The absence of NIR likely constrained the model’s ability to characterize key vegetation properties such as the canopy density and moisture content—attributes that are critical for accurately estimating the fractional cover of NWs. As a result, the model tended to slightly underestimate the NW coverage, especially in regions where the actual fractional coverage of NWs was relatively high.
The third configuration incorporated all ten spectral bands together with 23 commonly used vegetation indices (23CVIs), forming a high-dimensional input structure. Despite offering the most comprehensive information base, this configuration exhibited reduced performance (R
2 = 0.665, RMSE = 0.115). As illustrated in
Figure 6c, the scatter plot showed substantial dispersion, particularly within the mid-range coverage interval (0.3–0.6). These results suggest that excessive feature stacking may introduce redundancy and increase the risk of overfitting, ultimately impairing the model’s generalization capability.
The fourth configuration utilized the RE, NIR, and SWIR bands, excluding the VIS bands and all the vegetation indices. As shown in
Figure 6d, this setup achieved an R
2 of 0.730 and an RMSE of 0.104. Although the performance was reasonably strong, notable deviations were observed at both the low and high ends of the coverage distribution. These discrepancies suggest that the absence of visible-band information limited the model’s capacity to capture reflectance features associated with the leaf pigmentation and surface color, which are essential for distinguishing subtle variations in the NW coverage.
In summary, the model performance is strongly influenced by the structure and dimensionality of the input features. Full-spectrum fusion (VIS + RE + NIR + SWIR) emerged as the most effective strategy, enabling comprehensive spectral representation for high-accuracy estimation. The selective inclusion of vegetation indices can further enhance the performance when appropriately constrained. In contrast, excessive feature stacking or the exclusion of critical spectral domains may introduce redundancy and reduce the model accuracy. These findings highlight the importance of balancing spectral diversity and feature compactness when designing remote sensing models for accurate and scalable monitoring of NWs’ dynamics.
3.1.3. Comprehensive Evaluation of Feature Fusion Strategies
The experimental results presented above confirm that the strategy used to construct the input features plays a critical role in accurately modeling the fractional coverage of NWs. In particular, feature combinations that integrate multiple spectral domains—especially those incorporating the VIS, RE, NIR, and SWIR bands—consistently yielded the highest predictive performance across all the experiments. These findings highlight the superior effectiveness of multispectral fusion in capturing the vegetation heterogeneity and enhancing the model generalization.
Moreover, the inclusion of a small set of essential vegetation indices—particularly the two basic vegetation indices (2BVIs; NDVI and EVI)—further enhanced the model’s sensitivity to phenological dynamics. When combined with the spectral bands, these indices served a complementary role in improving the prediction accuracy. In contrast, the indiscriminate addition of a large number of high-dimensional indices, such as the 23 commonly used vegetation indices (23CVIs), did not lead to substantial performance gains. Instead, it frequently introduced feature redundancy and increased the model complexity, thereby elevating the risk of overfitting.
Additionally, the results indicate that models constructed with excessively high feature dimensionality—such as those integrating all 10 spectral bands together with the 23CVIs—were more susceptible to noise interference, leading to increased prediction errors and reduced model stability. Therefore, in the feature construction process, it is recommended to prioritize the inclusion of representative spectral bands and key vegetation indices. This should be complemented by appropriate feature selection or dimensionality reduction techniques to effectively control the model complexity and enhance both the robustness and generalizability.
Finally, the feature combinations that relied solely on a single spectral domain—such as NIR or SWIR—consistently exhibited inferior predictive performance compared to the multisource fusion strategies. This finding reinforces the notion that limited spectral information is inadequate for capturing the complex spectral signatures of NWs, thereby constraining the model’s ability to achieve high-precision estimation.
In summary, this section underscores the critical importance of multispectral feature integration and the selective inclusion of vegetation indices in enhancing the model performance. These findings offer both theoretical insight and practical guidance for the design of input features in deep learning-based remote sensing models for vegetation mapping.
3.2. Impact of Temporal Combination Strategies on Model Performance
To systematically evaluate the influence of temporal information on the regression performance, two temporal configuration dimensions were designed: sampling interval and observation month filtering. In this experiment, the input feature set was fixed to the optimal combination identified in
Section 3.1.2, specifically the VIS, RE, NIR, and SWIR bands. This choice ensures a consistent feature foundation across the different temporal strategies.
Four temporal interval strategies (interval_1 to interval_4) were constructed by subsampling from the 72 Sentinel-2 observations acquired in 2019, as summarized in
Table 4. These simulate different temporal resolutions.
In parallel, five month-based filtering strategies were designed to exclude low-quality or off-season observations, focusing on the phenological relevance of the vegetation dynamics. These retention strategies are defined in
Table 5.
By combining these dimensions, 20 composite configurations (e.g., interval_3 + retain_May_Sep) were constructed to form a comprehensive temporal input strategy set. The regression performance of each configuration is summarized in
Table 6 and visualized in
Figure 7.
For the sampling interval dimension, four temporal interval strategies were defined based on the 72 Sentinel-2 observations spanning the full calendar year of 2019 (
Table 4). Specifically, interval_1 retains all the available time steps (i.e., no downsampling), while interval_2, interval_3, and interval_4 retain one observation every two, three, and four time steps, respectively. These configurations were designed to simulate the varying temporal resolutions and evaluate their impact on the model’s ability to learn the time-dependent vegetation dynamics. The definitions of the interval strategies are summarized as follows.
For the month filtering dimension, five retention strategies were constructed to mitigate the influence of low-quality observations from the non-growing seasons—particularly winter—when the cloud contamination, snow cover, and low solar angles frequently degrade the data quality (
Table 5). Beginning with the full annual span (January to December), successive subsets were defined by progressively excluding marginal or low-activity months. For example, retain_May_Sep retains only the observations from May to September, thereby concentrating on the peak growing season while minimizing the temporal noise. The definitions of the month-based retain strategies are summarized as follows.
For example, the configuration interval_4 + retain_Mar_Oct refers to a setting in which one observation is retained every four time steps, and only those falling within the March to October period are utilized. This strategy balances temporal representativeness and data quality by capturing the major phenological phases of vegetation growth while excluding low-value observations typically associated with non-growing seasons.
3.2.1. Overall Performance Analysis of Temporal Combinations
To quantitatively evaluate the impact of different temporal combination strategies on the model performance, this study assessed the regression metrics—including the coefficient of determination (R
2), mean squared error (MSE), and root mean square error (RMSE)—across 20 distinct temporal configurations on the test dataset. As presented in
Table 6 and
Figure 7, the predictive performance varied substantially among the different combinations. The highest R
2 reached 0.770, while the lowest dropped to 0.665, resulting in a maximum difference exceeding 0.10. These results underscore the critical importance of the temporal feature design in modeling the fractional coverage of NWs.
From an overall perspective, the configuration interval_1 + retain_Jan_Dec, which retained all the observations throughout the year and employed the highest sampling density, achieved the best performance, with a test R2 of 0.770 and the lowest RMSE of 0.096. These results indicate that complete, high-frequency temporal sequences are most effective in capturing the spectral dynamics of NWs across their full growth cycle, thereby enhancing the model’s ability to learn vegetation trends and improving the regression accuracy. In contrast, the interval_2 + retain_May_Sep configuration—characterized by both sparse sampling and a restricted observation window—yielded the weakest performance, with an R2 of only 0.665 and an RMSE of 0.115. This suggests that simultaneously reducing the temporal span and resolution may result in the loss of critical phenological information, thereby impairing the model’s capacity to represent the vegetation dynamics accurately.
Further analysis based on the retention strategies revealed systematic differences in the model performance. Among all the configurations, those using retain_Jan_Dec achieved the highest average R2 (0.745), followed by retain_Feb_Nov (0.735) and retain_Mar_Oct (0.725). These results suggest that moderately excluding non-growing season observations—such as those from January and December—can effectively reduce the noise and enhance the model stability. Although the retain_May_Sep strategy, which focuses exclusively on the peak growing season, yielded a slightly lower average R2 of 0.707, it still demonstrated strong predictive performance, highlighting the modeling value of high-quality seasonal data.
From the perspective of the sampling frequency, the temporal resolution also had a substantial impact under a fixed month filtering window. For instance, under the retain_Jan_Dec condition, the interval_1 configuration achieved the highest R2 (0.770), outperforming interval_2 (R2 = 0.745), interval_3 (R2 = 0.728), and interval_4 (R2 = 0.736). This trend suggests that higher-frequency observations enhance the model’s sensitivity to short-term fluctuations in vegetation conditions, while sparser sampling may fail to capture critical phenological transitions, thereby reducing the prediction accuracy.
In summary, the temporal combination strategies not only define the structure of the time-series inputs but also directly influence the model’s capacity to characterize the seasonality, periodicity, and phenological dynamics. The results indicate that full-year, high-frequency observations represent the optimal configuration for maximizing the predictive accuracy. Nevertheless, strategically reducing the sampling interval or limiting the observation period can still yield competitive performance while enhancing data efficiency. Therefore, temporal configurations should be flexibly tailored to specific task requirements and computational constraints to achieve an optimal balance between model performance and operational efficiency.
3.2.2. Comparative Analysis of Representative Temporal Combinations
To further evaluate the impact of the temporal combination strategies on the model fitting performance, four representative configurations were selected, each reflecting a distinct temporal design strategy: full-year high-density observation, growing-season–focused sampling, inactive-period exclusion, and sparse seasonal input. Their predictive accuracy was assessed using scatter plots comparing the estimated and observed fractional coverage of noxious weeds in the test dataset (
Figure 8).
The first configuration, interval_1 + retain_Jan_Dec, retained all 72 Sentinel-2 observations across the full calendar year without subsampling. As shown in
Figure 8a, this full-coverage, high-frequency strategy yielded the best performance, achieving an R
2 of 0.770 and an RMSE of 0.096. The predicted values were tightly aligned with the 1:1 reference line across the entire coverage spectrum, particularly within the low-to-moderate range (0.2–0.5), indicating that a complete time series effectively captured the seasonal spectral dynamics and ensured robust model fitting.
The second configuration, interval_4 + retain_May_Sep, retained only one observation every four time steps, limited to the peak growing season (May to September). Despite its compressed temporal span and sparse sampling density, it still produced competitive results (R
2 = 0.746, RMSE = 0.100), as illustrated in
Figure 8b. The scatter plot revealed tightly clustered predictions within the mid-range interval (0.3–0.6), although slight underestimation occurred at higher coverage levels, likely due to insufficient representation of extreme cases in the training data.
The third configuration, interval_2 + retain_Feb_Nov, excluded January and December to avoid interference from extreme winter conditions while maintaining a moderate sampling frequency across February to November. As shown in
Figure 8c, this setting achieved near-optimal performance (R
2 = 0.759, RMSE = 0.098) while substantially reducing the number of input samples. The predictions were evenly distributed and exhibited low variance across all the coverage ranges, demonstrating that the exclusion of low-quality temporal segments can enhance the data efficiency without compromising the predictive accuracy.
The fourth configuration, interval_2 + retain_May_Sep, represented the weakest performer. As depicted in
Figure 8d, it combined a limited seasonal window with a reduced sampling frequency, resulting in the lowest R
2 (0.665) and the highest RMSE (0.115). The scatter plot showed substantial deviations and outliers, particularly in high-coverage regions (>0.6), where the predictions were consistently underestimated. These results indicate that excessive compression in both the temporal resolution and observational span impairs the model’s capacity to capture critical phenological transitions.
In summary, the comparative results demonstrate that full-length, high-frequency temporal input remains the most effective strategy for maximizing the predictive performance. However, selectively excluding redundant months or applying moderate reductions in the sampling frequency during key phenological periods can still yield satisfactory accuracy while significantly reducing the data volume and computational cost. These findings underscore the value of the temporal configuration as a flexible and practical tool for balancing the model precision and resource efficiency in remote sensing applications.
3.2.3. Overall Evaluation of Temporal Combination Strategies
The experimental results collectively underscore the pivotal role of the temporal combination strategies in the regression-based modeling of the noxious weed fractional coverage. Variations in both the sampling interval and observation month selection significantly affect the model’s ability to capture the vegetation growth dynamics and ultimately determine the predictive accuracy. Across the 20 evaluated configurations, several consistent patterns emerged.
First, complete and densely sampled time-series inputs substantially enhance the model’s capacity to capture temporal variations in the NW coverage. For example, the interval_1 + retain_Jan_Dec configuration, which incorporates all the available time steps without omission, achieved the best performance (R2 = 0.770, RMSE = 0.096). This highlights the importance of full-year, high-frequency observations in delivering comprehensive temporal signals—particularly advantageous for applications requiring high accuracy and model stability.
Second, moderately excluding non-growing season observations—especially winter months such as January and December—can effectively reduce the spectral noise without compromising the prediction accuracy. For instance, the interval_2 + retain_Feb_Nov strategy yielded near-optimal results (R2 = 0.759, RMSE = 0.098), despite a substantial reduction in the input volume. This demonstrates the effectiveness of a quality-over-quantity approach in enhancing the model robustness and generalization.
Third, temporal strategies that focus exclusively on the growing season while employing lower sampling frequencies can still achieve satisfactory performance with reduced data requirements. For example, interval_4 + retain_May_Sep retained only one image every four time steps during the May–September period, representing approximately 11% of the total samples used in the optimal configuration. Despite this reduction, it attained strong results (R2 = 0.746, RMSE = 0.100), illustrating an effective trade-off between accuracy and computational efficiency—well suited for rapid-response or resource-constrained applications.
Finally, excessive temporal compression leads to significant performance degradation. The interval_2 + retain_May_Sep configuration, although covering the core growing season, combined sparse sampling with a limited temporal span. It yielded the poorest performance (R2 = 0.665, RMSE = 0.115), indicating that concurrently reducing both the temporal density and coverage diminishes the model’s ability to capture critical phenological transitions, thereby weakening its predictive capacity.
In summary, temporal information serves not only as a foundational component of remote sensing modeling but also as a critical driver of the model’s capacity to learn phenological dynamics. The findings indicate that, whenever feasible, full-year and high-frequency observations should be prioritized to maximize the predictive accuracy. Alternatively, carefully designed temporal strategies—such as excluding redundant months or applying sparse sampling during informative periods—can achieve high efficiency with a minimal loss in accuracy. These results provide both theoretical insight and practical guidance for designing scalable and effective time-series remote sensing models for noxious weed monitoring.
3.3. Analysis of Noxious Weed Coverage and Its Elevation-Dependent Distribution Patterns
Figure 9a illustrates the spatial distribution of the fractional noxious weed coverage across the study area. Overall, the weed coverage levels remain relatively low throughout most regions. To further examine the spatial heterogeneity and elevation-dependent trends in the NWs’ expansion, a series of analytical visualizations was developed, including the pixel-wise distribution of weed coverage (
Figure 9b), the distribution of pixel counts across different elevation intervals (
Figure 9c), and the variation in the mean coverage and standard deviation within each elevation band (
Figure 9d). Together, these analyses reveal the spatial patterns and topographic dependencies that characterize the proliferation of noxious weeds in the region.
Figure 9b shows the distribution of the pixel counts across different coverage intervals. Overall, the NWs exhibited a clear tendency toward moderate levels of coverage, with the majority of pixels concentrated within the 30–50% range. Specifically, the 30–40% interval accounted for the highest proportion of pixels (32.7%), followed by the 40–50% range (25.3%). In contrast, pixels with coverage below 10% or above 70% were relatively rare, each comprising less than 5% of the total. This distribution pattern suggests that NWs are broadly distributed across the study area but typically occur at intermediate densities, indicating a moderate and widespread expansion trend rather than localized high-intensity infestations.
Figure 9c further illustrates the elevation-based distribution of the NW pixels. The results indicate that NWs are predominantly concentrated within the 4200–4800 m elevation range, with the peak pixel density observed between 4300 and 4500 m. This pattern suggests that mid-elevation zones—particularly those dominated by alpine meadows and alpine shrub lands—offer the most favorable ecological conditions for NW proliferation. In contrast, the pixel counts decline sharply at elevations below 4100 m and above 4900 m, likely due to a combination of topographic constraints, climatic limitations (e.g., low moisture and temperature), and land-use variability. Overall, the distribution of NWs exhibits a characteristic enrichment trend within mid- to high-elevation landscapes.
Figure 9d presents the mean and standard deviation of the NW coverage across different elevation intervals. Within the 4200–4700 m band, the mean coverage remains relatively high (approximately 39–40%) with a low standard deviation, indicating that NWs are not only abundant but also uniformly distributed at these elevations. In contrast, both the coverage levels and spatial consistency decline in the 4100–4200 m and 4800–5000 m bands, where the increased standard deviation reflects greater spatial heterogeneity and ecological uncertainty at the distributional margins. These results suggest that mid-elevation zones are not only hotspots of NW abundance but also areas of stable and consistent infestation, likely shaped by selective environmental pressures and relatively homogeneous habitat conditions.
In summary, the spatial distribution of NWs within the study area exhibits a strong elevation-dependent pattern, with the 4200–4800 m range identified as the core expansion zone for these invasive species. Accordingly, future management and control efforts should prioritize this elevational belt by implementing enhanced monitoring and targeted interventions in high-coverage patches, thereby improving the effectiveness of strategies aimed at suppressing further NW proliferation.
3.4. Ablation Study
To quantitatively assess the individual contributions of the dynamic masking and non-stationary modeling components within the DMNST framework, an ablation study was conducted by comparing the full model with three simplified variants: Transformer (Baseline), Dynamic Masked Transformer, and Non-Stationary Transformer. All the models share the same encoder-only Transformer backbone and were trained under identical experimental conditions.
Importantly, each variant was evaluated using the optimal input configuration identified in
Section 3.1 and
Section 3.2—specifically, the VIS, RE, NIR, and SWIR spectral bands combined with the full-year, high-frequency time series (interval_1 + retain_Jan_Dec). This setup ensured that the performance differences are attributable solely to architectural changes, rather than to variation in the input data quality or quantity.
The models were assessed using three standard regression metrics: MSE, RMSE, and R
2. The results are summarized in
Table 7.
The baseline Transformer model, which excludes both dynamic masking and target normalization, exhibited the weakest performance (R2 = 0.696), highlighting its limited ability to address spectral noise and seasonal variability in vegetation dynamics. Incorporating either enhancement independently led to modest improvements: the Dynamic Masked Transformer demonstrated improved robustness to cloud-contaminated inputs (R2 = 0.722), while the Non-Stationary Transformer better adapted to temporal non-stationarity (R2 = 0.729).
The DMNST model, integrating both modules, significantly outperformed all the variants, achieving the lowest MSE (0.009), lowest RMSE (0.096), and highest R2 (0.770). These results suggest a synergistic effect when temporal filtering and non-stationary modeling are applied in concert. Beyond the improved accuracy, the full model also exhibited enhanced robustness across the full spectrum of NW coverage values.
In summary, this ablation analysis underscores the effectiveness of each architectural enhancement and their combined contribution to the DMNST model. The dynamic masking module suppresses unreliable observations arising from atmospheric interference, while the non-stationary normalization facilitates more stable learning from temporally complex vegetation signals. Together, they provide a robust framework for accurate and scalable regression modeling of the NW fractional coverage using satellite time-series data.