1. Introduction
The Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6) revealed that the global average surface temperature was 1.09 °C higher than the average from 1850 to 1900 over the past decade (2011–2020). Moreover, each decade over the past 40 years experienced higher global average surface temperatures than any previous decade [
1,
2]. Vegetation is an essential component of terrestrial ecosystems, which plays a critical role in maintaining ecosystem stability. It responds to and influences climate change as an “indicator” of regional environmental change [
3,
4]. The fractional vegetation cover (FVC) reflects the surface distribution of vegetation and varies significantly seasonally and across regions, influenced by climate, topography, and human activities. Satellite remote sensing data offers various advantages, such as low cost, high reliability, and numerous product options, making it an important data source for analyzing various vegetation indices. The advancement of remote sensing technology has created favorable conditions for accurate monitoring of FVC over large areas [
5]. In particular, the Earth Observing System Moderate Resolution Imaging Spectroradiometer (EOS-MODIS) normalized difference vegetation index (NDVI) data product offers several advantages compared with the Global Inventory Modelling and Mapping Studies (GIMMS) NDVI, the Advanced Very High Resolution Radiometer (AVHRR) NDVI, and the Systeme Probatoire d’Observation de la Terre (SPOT-VGT) NDVI. These advantages include simultaneously observing multiple channels, higher spatiotemporal resolution, large-scale observation capabilities, and high accuracy. Additionally, the dataset has undergone processes such as radiometric calibration, atmospheric correction, and cloud detection, ensuring a certain level of quality assurance. Consequently, it is widely used in the studies of the changes in vegetation cover [
6,
7,
8,
9,
10].
Domestic and international scholars monitored and analyzed the changes in FVC in various regions. For example, Wu et al. (2014) [
11] used GIMMS NDVI data to estimate global FVC from 1982 to 2011, revealing significant seasonal variations. High-latitude regions showed a marked increase in FVC due to global warming. Shobairi et al. (2018) [
12] analyzed the changes in FVC and drivers in Guangdong Province, China, from 2000 to 2010 using MODIS-NDVI data. Their trend analysis indicated that FVC was higher in the less economically developed northern mountainous regions with minimal human disturbance. In contrast, industrialization and urbanization in the southern coastal regions led to lower FVC. They also detected a positive correlation between FVC and sunshine hours. Hill and Guerschman (2020) [
13] developed a global FVC product and investigated FVC levels and trends across grassland ecoregions globally. Their findings revealed significant positive and negative FVC trends in many ecoregions. East Africa, Patagonia, and Australia’s Mitchell Grasslands experienced substantial declines in non-photosynthetic vegetation and increases in bare soil, reflecting the interactions between prolonged drought, heavy livestock use, agricultural expansion, and other land use changes. Li et al. (2022) [
14] used trend analysis, the Google Earth Engine (GEE) platform, and a random forest classifier to examine vegetation dynamics during the growing seasons from 2001 to 2020 in the China–Myanmar Economic Corridor. They quantified the spatial distribution, change patterns, and driving factors of FVC, discovering a 0.68% decrease in the average FVC of forests and grasslands. However, cropland, which was concentrated in the south–central region, contributed 50.4% to the FVC increase, highlighting the offsetting effect of increased cultivated crops on natural vegetation loss. Wang et al. (2022) [
15] used the GEE platform to retrieve FVC in the Yellow River Basin from 1999 to 2019. They showed significant FVC improvements, particularly in the central basin, with precipitation, sunshine duration, and relative humidity being the most influential factors. Eisfelder et al. (2023) [
16] analyzed seasonal vegetation trends in Europe from 1981 to 2018 using AVHRR NDVI data. Their trend analysis identified distinct vegetation cover patterns for spring, summer, and autumn across different European regions. They found positive trends in vegetation cover over large areas of Europe when considering the entire growing season. Dastigerdi et al. (2024) [
17] examined the changes in vegetation cover in northeastern Iran from 2001 to 2020 using MODIS-NDVI time-series data (MOD13Q1). The trend analysis revealed significant increases in vegetation cover in 32% of the region and decreases in 26%. The increasing trends were mainly observed in highland areas.
Approaches for estimating FVC include field measurements and satellite-based observations. While field measurements provide higher accuracy and spatial resolution, they are constrained by high costs, intermittent observation times, and limited coverage areas. In contrast, remote sensing observations are more cost-effective, cover extensive areas, and allow for long-term FVC estimation. Methods based on remote sensing include empirical models, pixel unmixing models, physical methods, and machine learning (ML) techniques. Among these, pixel unmixing models are widely adopted due to their simplicity and practicality [
18,
19,
20,
21]. These models assume that each pixel in a remotely sensed image consists of two or more components, and FVC is determined by decomposing these mixed components. The pixel dichotomy model, a linear variant of the pixel unmixing model, assumes that each pixel comprises vegetation and non-vegetation components, and it estimates FVC based on this decomposition. This model does not rely on field-measured FVC data, making it well-suited for regional vegetation monitoring [
22].
In recent decades, several global-scale FVC products have been developed using remote sensing data, including the Global LAnd Surface Satellite (GLASS) FVC, VGT bioGEOphysical product Version 1 (GEOV1) FVC, VGT bioGEOphysical product Version 2 (GEOV2) FVC, PROBA-V bioGEOphysical product Version 3 (GEOV3) FVC, the Carbon CYcle and Change in Land Observational Products from an Ensemble of Satellites (CYCLOPES) FVC, and the Multi-source Synergized Quantitative (MuSyQ) FVC [
23,
24]. These products are generated based on different satellite sensors, resolutions, revisit intervals, spatial ranges, temporal scopes, and algorithms. Numerous comparative and evaluative studies have been conducted on these FVC products. For instance, Liu et al. (2019) [
25] performed a spatiotemporal comparison and validation of three global FVC products (GEOV2, GEOV3, GLASS). The results indicated general spatiotemporal consistency across most regions. The GLASS and GEOV2 FVC products demonstrated reliable spatiotemporal completeness, whereas the GEOV3 FVC product exhibited significant data gaps in high-latitude regions, particularly in winter. The GEOV3 product also showed higher FVC values compared to GEOV2 and GLASS products near the equator. Differences between GEOV2 and GLASS FVC products were most pronounced in deciduous forests, where GLASS reported slightly higher FVC values during winter. Temporal profiles of GEOV2 and GLASS were more consistent than those of GEOV3, with GLASS showing greater accuracy when compared to reference FVC datasets. Therefore, different FVC products demonstrate variations in FVC values with respect to different geographical regions, seasons, and types of underlying land surfaces.
FVC is a crucial indicator for assessing surface vegetation growth. Research on the changes in FVC significantly impacts regional ecosystem health and stability, climate change monitoring and prediction, biodiversity conservation, land use planning, agricultural production and food security, water resource management, carbon cycling, global change, disaster risk assessment, and environmental policy and planning. Therefore, predicting FVC is of great practical significance. The prediction methods for FVC include traditional regression models [
26], trend extrapolation [
27], cellular automata (CA)–Markov [
9], rescaled range analysis (Hurst exponent) [
28], gray models [
29], and future multi-scenario simulations [
30,
31]. Kumar et al. (2014) [
26] used a logistic regression model (LRM) to predict the changes in forest cover in the Bhanupratappur forest division of Kanker district, Chhattisgarh State, India. The forest cover data from 1990 and 2000 were used to predict forest cover for 2010, with the LRM achieving reasonably high accuracy (receiver-operating characteristic = 87%). Cui et al. (2021) [
9] predicted vegetation coverage grades for 2025 using data from 2008, 2010, and 2013. Their results indicated an upward trend in vegetation coverage in the Qinling Mountains under policy guidance, especially in urban areas. Ahmad et al. (2023) [
28] tracked the spatiotemporal changes in vegetation in Pakistan from 2000 to 2020, using the Hurst exponent to estimate future trends. Values above 0.5 suggested consistent future vegetation trends in all four provinces. Wang et al. (2024) [
31] used MODIS-NDVI data and a pixel dichotomy model to estimate FVC. Then, they analyzed the spatiotemporal evolution of vegetation cover in Shenyang City, China, from 2000 to 2020 using trend and deviation analyses. They employed the patch-generating land use simulation model, based on land use data from 2010, 2015, and 2020, to simulate vegetation cover scenarios for 2030 in Shenyang City.
The rapid development of big data and artificial intelligence technologies in recent years has led to the increasing application of machine learning methods, such as
k-nearest neighbor (KNN), random forest (RF), deep neural network (DNN), support vector regression (SVR), multiple linear regression (MLR), support vector machine (SVM), artificial neural network (ANN), and long short-term memory, for estimating and predicting FVC [
32,
33,
34,
35,
36,
37,
38]. For instance, Jia et al. (2021) [
34] used rainfall and temperature as input variables. They employed MLR, ANN, and SVM models to predict the changes in vegetation cover in the tributaries of the Wei River Basin. The results indicated that the prediction accuracies of the three models ranked as SVM > ANN > MLR, revealing a complex nonlinear relationship between meteorological factors and FVC. Roy (2021) [
35] extracted NDVI and enhanced vegetation index (EVI) values from the MODIS dataset (2001–2018) to predict vegetation indices for 2019, testing four supervised ML algorithms (SVR, RF, linear, and polynomial regression). The models predicted NDVI with an error range of 1.51–5.73% and EVI with an error range of 4.33–6.99%. An upward linear trend was observed in the data, suggesting increasing vegetation cover. Ahmad et al. (2023) [
36] introduced a convolutional long short-term memory (ConvLSTM) model for more comprehensive and detailed NDVI forecasts. They compared the ConvLSTM network with the parametric crop growth model (PCGM) using the root mean square error (
RMSE) metric on the same set of soybean crop field pixels. The ConvLSTM model, with its best training configuration, achieved an
RMSE of 0.0782, outperforming the PCGM’s
RMSE of 0.0989. Peng (2023) [
37] developed aquatic FVC retrieval models using KNN, RF, and DNN algorithms based on Landsat-8 satellite images of Wuliangsuhai Lake. The results indicated that all three models performed well, with the DNN model showing the best performance. The DNN model achieved an
R2 score of 0.873, an
RMSE of 0.118, and a slope of 0.856 in the univariate linear fit between estimated and actual values. ML is a method that simulates human learning using knowledge from probability theory and statistics to establish the relationships from existing data, extracting valuable information from large and complex datasets for forecasting, predicting change trends, and improving learning efficiency [
39,
40]. ML offers several advantages compared with other FVC prediction methods: It can handle complex relationships and nonlinear correlations, adapt to large-scale and multidimensional data, and has data-driven and automated feature learning capabilities. Additionally, it provides high flexibility, strong generalization ability, and high prediction accuracy when combining multi-source data. Thus, ML has become a popular strategy for FVC prediction with significant application potential.
Southwest China (SWC) is a crucial ecological security barrier and also an area characterized by ecological fragility and climate sensitivity. Zheng et al. (2016) [
41] used MODIS-NDVI data to estimate the annual maximum FVC at a 250 m resolution in SWC from 2000 to 2010, analyzing the changes in vegetation cover in forests, shrubs, and grasslands. Peng et al. (2017) [
42] employed the mean method, coefficient of variation, and correlation analysis to examine the changes in FVC in forest and grassland areas and their relationship with precipitation in the five southwestern provinces (Guizhou, Yunnan, Sichuan, Guangxi, and Chongqing) from 2009 to 2015, considering terrain and vegetation types. Their findings indicated that FVC in these regions decreased from 0.87 to 0.78, signifying obvious vegetation degradation, with precipitation effects showing significant spatial variability. Feng and Dong (2022) [
43] and He et al. (2021) [
44] used MODIS-NDVI data to investigate the spatiotemporal evolution of FVC in Yunnan Province and Chongqing municipality, respectively. The results showed that FVC in Yunnan Province exhibited a “single peak” distribution and an increasing trend from 2010 to 2020, whereas Chongqing municipality witnessed an overall improvement in FVC from 2000 to 2015. Huang et al. (2023) [
45] evaluated the differences between two global FVC products, GEOLAND2 Version 3 and Global Land Surface Satellite, in SWC. They discovered significant spatiotemporal and seasonal discrepancies between the products, with large variations in values across different land use types, slopes, and altitudes. The study concluded that, when selecting FVC product data, the influences of season, terrain, and surface type should be considered when choosing the most appropriate remote sensing data products based on specific research objectives.
The literature review shows that the current studies on FVC in SWC primarily focus on the overall analysis of spatiotemporal variation characteristics for areas excluding Tibet. These studies are not sufficiently specific and comprehensive regarding the spatiotemporal variation characteristics of FVC within specific provinces (Sichuan, Yunnan, Guizhou) and municipalities (Chongqing, Tibet) in SWC. Furthermore, the temporal scale of the research is relatively outdated, with a significant lack of studies on FVC estimation and prediction based on ML methods in SWC. Given the complexities of the topography, climate sensitivity, and diverse vegetation types in SWC, several questions arise regarding global warming: What are the recent overall spatiotemporal variation trends of FVC in SWC? How do the spatiotemporal variation trends and differences in FVC manifest across the five provinces and municipalities, including Tibet? What is the prediction accuracy and effectiveness of using different ML models based on MODIS-NDVI data for the growing-season FVC? These questions necessitate a more comprehensive and integrated analysis, particularly considering the high-altitude region of Tibet, which is significantly impacted by global warming. The present study addresses these issues, aiming to provide a scientific understanding of the changes in the ecological environment in SWC. The results offer critical theoretical and technical support for ecological environment protection and the construction of ecological civilization in SWC.
4. Discussion
This study analyzed the spatiotemporal variation characteristics of FVC in SWC from 2000 to 2020. The results were consistent with the findings of some scholars in previous studies within the corresponding timescales for the region. For example, Zheng et al. (2017) [
83] used MODIS-NDVI data to analyze the dynamic changes in FVC in SWC (Guangxi, Guizhou, Chongqing, Yunnan, Sichuan, and parts of western Qinghai and southeastern Tibet) from 2000 to 2014. The FVC in SWC showed a decreasing trend from southeast to northwest. Over the 15 years, the annual maximum FVC showed an overall increasing trend, with the largest increase from 2009 to 2014. During this period, FVC showed a fluctuating upward trend, and the overall vegetation cover improved. FVC increased in all seasons except for a decreasing trend in summer. This study found that the FVC in all seasons showed an increasing trend. However, the increasing trend in summer was not significant and did not pass the significance level test at
p = 0.05. Additionally, the rate of increase varied due to the differences in the study area scope and timescale. Xiong et al. (2019) [
84] investigated the spatiotemporal variation characteristics of vegetation cover in the growing season (April to September) in SWC from 2000 to 2016. They indicated that MODIS-NDVI showed an increasing trend over the 17 years, with the most significant increase in April. The area showing an increasing trend accounted for 71.49% of the total study area, mainly in the eastern and southeastern regions. Compared with the results of this study, the interannual increasing trend and the spatial distribution of the increasing trend were generally consistent. However, significant differences in the rate of increase over time and the spatial proportion of the increasing trend were observed due to the differences in the study area scope and the division of the growing-season timescale. Li (2019) [
85] analyzed the spatiotemporal variation characteristics of GIMMS FVC in SWC from 1982 to 2016 and found an increasing overall trend of FVC, with vegetation conditions showing positive development in different seasons and the growing season. Most areas had good vegetation cover, with an annual FVC average of 0.46. Despite the differences in the study area and timescale, these results are basically consistent with the findings of this study. Additionally, SWC generally displayed a moderately stable (34.1%) and a less stable (31.1%) development trend, which corresponded to the results of this study revealing a relatively large area of slightly significant increase and slightly significant decrease in spatial variation trends in different seasons, interannually, and in the growing season. Furthermore, Zhang et al. (2020) [
86], Yan et al. (2021) [
87], and Duan et al. (2022) [
88] also examined the vegetation cover in SWC (Sichuan, Yunnan, Guizhou, Chongqing, and Guangxi) using SPOT NDVI (2000–2015), GIMMS NDVI (1982–2015), and AVHRR and GIMMS NDVI (1982–2015), respectively, all revealing an overall increasing trend in vegetation. Compared with the aforementioned studies, which focused on timescales up to 2016, this study addressed the spatiotemporal variation characteristics in recent years with the intensification of global warming. Previous studies did not extensively cover the spatiotemporal variation trends and differences in FVC across the five provinces and municipalities, including Tibet, resulting in relatively less comprehensive analyses.
ML methods are increasingly used to estimate and predict vegetation changes due to their ability to handle complex relationships and nonlinear associations, adaptation to large-scale and multidimensional data, and features such as data-driven and automated feature learning, high flexibility, and strong generalization ability. Among the four individual ML models selected in this study, the LightGBM model stood out due to its histogram-based optimization and flexible tree growth strategy, offering high efficiency and accuracy. It also excelled in handling complex interactions and nonlinear relationships between features, making it the best-performing model with the highest prediction accuracy. The RR model, which is a linear regression extension technique used to address multicollinearity issues, struggled with the nonlinear relationships between vegetation cover and terrain and climate factors, resulting in the poorest overall evaluation performance and the lowest prediction accuracy. A WAHEM was constructed in this study by weighting the selected individual ML models, aiming to leverage their strengths to better learn the sample features and achieve superior prediction results. However, the WAHEM performed the best in all four evaluation metrics on the training set. The performance of the WAHEM was not as good as that of the LightGBM and SVR models on the validation and test sets. Nonetheless, the differences in comprehensive evaluation metrics were not significant. This might be attributed to factors such as the random division of the dataset, the relative weight settings of different ML methods, and the choice of heterogeneous ensemble learning strategies.
The MOD13A3 NDVI data, which are provided by the MODIS satellite, are a crucial operational product for monitoring the changes in surface vegetation. They include global monthly data with a spatial resolution of 1 km, which are primarily generated based on the NDVI 16-day composite algorithm. This product features high clarity and minimal cloud impact. The MOD13A3 product also includes a pixel reliability band to assess pixel imaging quality, allowing the exclusion of NDVI pixels affected by ice/snow and cloud cover during preprocessing. The atmospheric correction is performed using surface bidirectional reflectance to ensure data accuracy, thus removing the effects of water, clouds, heavy aerosols, and cloud shadows. The FY3D satellite, launched in December 2017, carries the MERSI instrument, which is capable of obtaining seamless global true-color images at a 250 m resolution and images from two infrared split-window regions daily. It is considered one of the most advanced wide-swath imaging remote sensing instruments today, with performance similar to that of the MODIS satellite sensor [
89,
90]. MERSI’s NDVI data, available from May 2019 onward, are an important product for deriving global vegetation parameters and monitoring vegetation changes [
91]. As the MODIS satellite approaches retirement, FY3D is expected to succeed it, continuing to provide products with different temporal and spatial scales. However, its products still require further evaluation and validation.
The MODIS satellite sensor achieves high calibration accuracy due to its advanced on-orbit calibration equipment and regularly updated calibration coefficients. The absolute radiometric calibration accuracy can reach 5%, and the relative radiometric calibration accuracy can reach 1%. This excellent performance has established MODIS as a standard satellite for sensor calibration, which is extensively used in cross-calibration studies of other global sensors. In contrast, the performance of the FY satellite series sensors degrades more significantly with longer on-orbit operation time [
92]. In this study, we compared the FVC predictions of four individual ML models and the WAHEM for the growing seasons of 2021–2023 with MODIS-MOD13A3-FVC and FY3D-MERSI-FVC. The spatial variation trends predicted using all ML models were consistent over the three years, showing lower values in the west, higher values in the east, and a decreasing trend from south to north in the central region. All model predictions were closer to MODIS-MOD13A3-FVC, with high consistency in spatial distribution trends, although the predicted FVC values were relatively higher. The spatial distribution was also consistent with FY3D-MERSI-FVC; however, the value differences were larger, with FY3D-MERSI-FVC values generally lower. MODIS-MOD13A3-FVC values were generally higher than FY3D-MERSI-FVC values. As the output variable for constructing prediction models using different ML methods was MODIS-MOD13A3-FVC, the predictions for 2021–2023 were expected to be closer to MODIS-MOD13A3-FVC values. This bias was determined by the learning capabilities of the ML models. The comparison results of the two data products aligned with the findings of Wang and Li (2022) and Zhang (2023) [
93,
94]. Wang and Li (2022) [
93] evaluated the quality and usability of FY3D-MERSI NDVI data by comparing it with MODIS-Terra NDVI data from May 2019 to December 2020, based on spatial patterns and time series. The results showed high consistency in spatial distribution and time-series characteristics. On a global average level, FY3D-MERSI NDVI was systematically lower than MODIS-Terra NDVI, with a tendency to underestimate high values and overestimate low values, resulting in a slightly narrower dynamic range. The linear regression model using MODIS-Terra NDVI as the independent variable and FY3D-MERSI NDVI as the dependent variable showed high accuracy (
R2 of 0.91–0.95,
RMSE of 0.048–0.068), with regression coefficients showing some temporal variation (slope of 0.87–0.94, intercept of 0.02–0.04). Zhang (2023) [
94] compared FY3D-MERSI NDVI with MODIS-NDVI products from 2020–2023 in terms of spatial patterns, processing methods, time series, and site-scale comparisons. The results also showed a strong spatiotemporal correlation in NDVI, with consistent seasonal variation trends for different underlying surface types. The scatter plots were mostly distributed around 1:1, with an average
RMSE of less than 0.06. Overall, MODIS-NDVI values were higher than FY3D-MERSI NDVI values, with FY3D-MERSI NDVI being lower than MODIS-NDVI in high vegetation cover areas and higher in low vegetation cover areas, especially in desert sites. This discrepancy was attributed to the differences in their spectral response functions and atmospheric corrections. Although the spectral response functions of FY3D and MODIS for vegetation monitoring were extremely similar in terms of bandwidth and peak response, FY3D MERSI had a slightly wider lower limit in the red band and a wider upper limit in the near-infrared band compared with MODIS. The spectral response functions were relatively wider, and the central wavelengths of the red and near-infrared bands were different, with MODIS at 0.645 μm (red) and 0.858 μm (near-infrared), and FY3D at 0.650 μm (red) and 0.865 μm (near-infrared), leading to saturation in high vegetation cover areas. The differences in the original bands used for NDVI calculation between the two satellites could cause certain biases ([
95];
http://www.nsmc.org.cn/, accessed on 16 July 2024). Previous studies showed that the MERSI bands were more susceptible to atmospheric water vapor, and their signal-to-noise ratio was still slightly lower than that of MODIS, leading to some differences in monitoring values. Additionally, the interpolation methods used to unify the spatial resolutions also caused calculation biases due to the different spatial resolutions of the NDVI data from the two satellites.
In the LightGBM model, DEM had the highest feature importance score, indicating that the relationship between FVC and elevation was the closest and significantly influenced the six input features. The topography of SWC is complex. The terrain in mountainous and hilly areas became more rugged with an increase in elevation, making soil and water conservation more difficult. Additionally, the temperature gradually decreased, leading to a decrease in the FVC in these areas with increasing elevation. This underscored the close relationship between FVC and elevation in SWC [
84]. The authors previously found that the elevation and FVC at different timescales were significantly correlated, with a notable downward trend. FVC at different timescales increased significantly with the slope, but when the slope exceeded 25°, FVC gradually decreased, highlighting the impact of elevation and slope on FVC in SWC [
96]. SSWRC had the second-highest feature importance score, indicating the importance of SSWRC to FVC. Bao et al. (2023) [
97] found that, when the thermal conditions for plateau vegetation growth were sufficiently met, the soil moisture conditions surpassed temperature and other thermal factors to become the most crucial climatic factor affecting plateau vegetation growth. Therefore, the importance score of SSWRC was the highest among the five climatic factors selected in this study, which was consistent with our results. In summary, the six factors selected in this study had varying degrees of importance to FVC and played a key role in maintaining ecosystem health and ecological balance.
5. Summary and Conclusions
FVC is a crucial indicator for assessing surface vegetation conditions and a key factor affecting soil erosion, water loss, and ecosystem health. The changes in FVC are essential for monitoring regional ecological environments and hold significant importance for the research on climate, hydrology, ecology, and global changes. SWC is characterized by complex topography, diverse climate types, and rich vegetation types. It is also the source and flowing area of numerous rivers. The changes in vegetation have a significant impact on the environment, ecosystems, and socio-economic conditions in the region and related downstream areas. This study aimed to calculate the average annual, seasonal, and growing-season FVC for the entire SWC and its five constituent provinces/municipalities from 2000 to 2020 using the pixel dichotomy model and the NDVI data from the MODIS–MOD13A3. The study also analyzed the spatiotemporal variation characteristics of FVC at various timescales. We selected four ML models: LightGBM, SVR, KNN, and RR, by integrating five climate factors from ERA5 data, DEM, and FY3D-MERSI NDVI data. Additionally, a WAHEM optimized from these single ML models was also constructed to predict growing-season FVC in SWC. The performance of the different ML models was comprehensively evaluated using tenfold cross-validation and multiple performance metrics. The models were then used to predict FVC for the growing seasons of 2021–2023. The predicted results were evaluated and compared with their actual values and FY3D-MERSI-FVC through result assessment and spatiotemporal cross-validation analysis.
The results indicated that the FVC calculated using the pixel dichotomy model in this study had an R2 value close to 1, with MAE and RMSE values near 0 when compared to the GLASS FVC product. This demonstrated a high degree of correlation and consistency between the two datasets and provided indirect validation of the reliability and effectiveness of the FVC data calculated in this study. The overall FVC in SWC predominantly increased from 2000 to 2020. Among the five provinces/municipalities, all except Tibet showed an increasing trend annually, seasonally, and during the growing season, with Chongqing (0.54/100a) exhibiting the most significant increase. Over the 21 years, the FVC spatial distribution in SWC generally showed a high east and low west pattern, with extremely low FVC in the western plateau of Tibet and higher FVC in parts of eastern Sichuan, Chongqing, Guizhou, and Yunnan. The annual average FVC value (0.46) was close to that of spring (0.46) and autumn (0.44), whereas the value of the growing season (0.51) was slightly lower than that of summer (0.53). The interannual spatial variation indicated a significant degradation of FVC in central and western Tibet; however, southeastern Tibet showed no significant trend. The vegetation improvement was relatively good in the eastern part of SWC, including the Sichuan Basin, Chongqing, Guizhou, and eastern Yunnan, whereas the area of Hengduan Mountains showed an overall decreasing trend. Areas with very significant, significant, and slightly significant increasing trends accounted for 25.3% of the total area of SWC, primarily in Chongqing, Guizhou, southeastern Sichuan, eastern Yunnan, and southeastern and northern Tibet. Areas with very significant, significant, and slightly significant decreasing trends accounted for 25.8%, mainly in western Tibet, northern Yunnan, and western Sichuan. Overall, the interannual FVC spatial variation trend in Tibet was decreasing, with some increases in southeastern and northern Tibet. The average FVC values for the five categories remained relatively stable from 2000 to 2020. However, due to human activities, vegetation coverage in eastern Tibet and western Sichuan decreased from high to low between 2000 and 2007. In contrast, following the implementation of reforestation and grassland restoration policies, the Sichuan Basin and the Yunnan–Guizhou Plateau experienced an increase in vegetation coverage from low to high between 2007 and 2014. From 2015 to 2020, while the area ratio of the highest FVC category significantly decreased, the other four categories exhibited a modest increase in their area ratios.
The determination coefficient R2 scores from tenfold cross-validation for the four ML models were 0.9 (LightGBM), 0.83 (SVR), 0.75 (KNN), and 0.74 (RR), respectively. The LightGBM model achieved an R2 score of 0.9, close to 1, indicating a strong fit between predicted and true values and demonstrating strong predictive ability. WAHEM exhibited the best performance across all evaluation metrics in the training set, RMSE, MAE, EVS, and CC were 0.0048, 0.0535, 0.960, and 0.9856, respectively, indicating the highest prediction accuracy, followed by LightGBM, SVR, and KNN models. The RR model performed the worst. LightGBM had the best performance and highest prediction accuracy in the validation and test sets, followed by SVR, WAHEM, and KNN models, with the RR model showing the poorest performance and lowest prediction accuracy across all three datasets. The spatial distribution maps of FVC predictions for the growing seasons of 2021–2023 from four individual ML models and the WAHEM, compared with MODIS-MOD13A3-FVC and FY3D-MERSI-FVC, showed that the spatial variation trends of FVC predicted using all models were quite consistent, showing lower values in the west, higher values in the east, and decreasing values from south to north in the central region. The FVC values changed significantly and inversely with the changes in altitude in the central region, the transition area from eastern Tibet to western Sichuan. The model predictions were closer to MODIS-MOD13A3-FVC (true values) compared with MODIS-MOD13A3-FVC and FY3D-MERSI-FVC, showing consistent spatial distribution trends. However, the predicted FVC values were relatively higher. The spatial distribution was also consistent with FY3D-MERSI-FVC; however, the value differences were larger, with FY3D-MERSI-FVC generally having lower values, especially in the eastern region.
Based on a comprehensive comparison of FVC predicted values by different ML models with actual values over 3 years, the SVR model’s predictions were the closest to the true values in the annual average predictions for each region, whereas the RR model showed the largest differences. The smallest deviations between model predictions and actual values occurred in 2021, followed by 2023, with the largest deviations in 2022. When comparing the predicted values of different ML models with FY3D-MERSI-FVC data over 3 years, in 2021, the largest FVC value deviation was observed in Sichuan Province. In 2022, it was in Yunnan, and in 2023, it was in Chongqing. Tibet showed the smallest deviations in all 3 years. The RMSE and MAPE evaluation results comparing the predicted values of different ML models with actual values from 2021 to 2023 indicated that the comprehensive statistical evaluation of each pixel over these 3 years revealed that the LightGBM model performed the best, followed by the WAHEM, whereas the RR model performed the worst. In the LightGBM model, the highest feature importance score was assigned to DEM, indicating a strong correlation with FVC. Among the five selected climatic factors, SSWRC was identified as having the most significant impact on FVC.
The results of this study might aid in understanding the spatiotemporal characteristics of vegetation cover change in SWC under global warming, particularly in the high-altitude Tibetan region. Validating moderate spatial resolution FVC datasets poses challenges due to the difficulty of directly comparing ground-based point measurements with these datasets because of surface heterogeneity. This study validated the FVC calculated using the pixel dichotomy model against the widely used and highly accurate GLASS FVC product, but the validation is still somewhat limited. Despite the availability of many ML methods, this study only examined four. Future research should incorporate additional FVC validation methods and explore a broader range of ML techniques, especially deep learning and hybrid models combining different advantages. In terms of ML input variables, this study considered only DEM and some climate factors. Future studies should incorporate additional features, including both natural and human activity factors, to improve prediction accuracy. While constructing heterogeneous ensemble models, future studies should also explore stacking and blending ensemble methods. Additionally, attempts should be made to achieve more refined FVC predictions by improving the overall spatiotemporal resolution of the data. Hence, potential directions exist for further research. In conclusion, the prediction of FVC for the growing season in this study provides a solid theoretical basis and data foundation for ML-based prediction methods and also offers valuable reference and guidance for other similar studies.