Remote Sensing and Machine Learning Uncover Dominant Drivers of Carbon Sink Dynamics in Subtropical Mountain Ecosystems

Xia, Leyan; Tan, Hongjian; Zhang, Jialong; Yang, Kun; Teng, Chengkai; Huang, Kai; Yang, Jingwen; Cheng, Tao

doi:10.3390/rs17162843

Open AccessArticle

Remote Sensing and Machine Learning Uncover Dominant Drivers of Carbon Sink Dynamics in Subtropical Mountain Ecosystems

by

Leyan Xia

^1,2,3,4,

Hongjian Tan

⁵,

Jialong Zhang

^1,2,3,4,*

,

Kun Yang

^1,2,3,4,

Chengkai Teng

^1,2,3,4,

Kai Huang

^1,2,3,4

,

Jingwen Yang

^1,2,3,4 and

Tao Cheng

⁶

¹

The Key Laboratory of Forest Resources Conservation and Utilization in the Southwest Mountains of China Ministry of Education, Southwest Forestry University, Kunming 650224, China

²

Key Laboratory of National Forestry and Grassland Administration on Biodiversity Conservation in Southwest China, Southwest Forestry University, Kunming 650224, China

³

Yunnan Province Key Laboratory for Conservation and Utilization of In-Forest Resource, Southwest Forestry University, Kunming 650224, China

⁴

College of Forestry (College of Asia-Pacific Forestry), Southwest Forestry University, Kunming 650224, China

⁵

College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming 650224, China

⁶

National Geomatics Center of China, Beijing 100830, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(16), 2843; https://doi.org/10.3390/rs17162843

Submission received: 28 June 2025 / Revised: 4 August 2025 / Accepted: 14 August 2025 / Published: 15 August 2025

(This article belongs to the Section Ecological Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

Net ecosystem productivity (NEP) serves as a key indicator for assessing regional carbon sink potential, with its dynamics regulated by nonlinear interactions among multiple factors. However, its driving factors and their coupling processes remain insufficiently characterized. This study investigated terrestrial ecosystems in Yunnan Province, China, to elucidate the drivers of NEP using 14 environmental factors (including topography, meteorology, soil texture, and human activities) and 21 remote sensing features. We developed a research framework based on “Feature Selection–Machine Learning–Mechanism Interpretation.” The results demonstrated that the Variable Selection Using Random Forests (VSURF) feature selection method effectively reduced model complexity. The selected features achieved high estimation accuracy across three machine learning models, with the eXtreme Gradient Boosting Regression (XGBR) model performing optimally (R² = 0.94, RMSE = 76.82 gC/(m²·a), MAE = 55.11 gC/(m²·a)). Interpretation analysis using the SHAP (SHapley Additive exPlanations) method revealed the following: (1) The Enhanced Vegetation Index (EVI), soil pH, solar radiation, air temperature, clay content, precipitation, sand content, and vegetation type were the primary drivers of NEP in Yunnan. Notably, EVI’s importance exceeded that of other factors by approximately 3 to 10 times. (2) Significant interactions existed between soil texture and temperature: Under low-temperature conditions (−5 °C to 12.15 °C), moderate clay content (13–25%) combined with high sand content (40–55%) suppressed NEP. Conversely, within the medium to high temperature range (5 °C to 23.79 °C), high clay content (25–40%) coupled with low sand content (25–43%) enhanced NEP. These findings elucidate the complex driving mechanisms of NEP in subtropical ecosystems, confirming the dominant role of EVI in carbon sequestration and revealing nonlinear regulatory patterns in soil–temperature interactions. This study provides not only a robust “Feature Selection–Machine Learning–Mechanism Interpretation” modeling framework for assessing carbon budgets in mountainous regions but also a scientific basis for formulating regional carbon management policies.

Keywords:

driving force; remote sensing applications; feature selection; machine learning; SHapley Additive exPlanations

1. Introduction

Since the 20th century, global atmospheric CO₂ concentrations have continued to rise, leading to gradual climate warming. This climatic change has profoundly impacted ecosystem cycles and biodiversity [1,2]. Following the adoption of the United Nations Framework Convention on Climate Change (UNFCCC) in 1992, research on ecosystem carbon source–sink dynamics has become a central focus in global change science [3,4]. As a critical carbon sink, terrestrial ecosystems absorb approximately 28% of anthropogenic CO₂ emissions annually [5]. Enhancing terrestrial carbon sequestration is widely recognized as one of the most viable strategies for mitigating rising atmospheric CO₂ levels and address climate change [6]. China’s terrestrial ecosystems exhibit significant regional specificity in the global carbon cycle, contributing 10–31% of the global terrestrial carbon sink [7] and playing a pivotal role in global carbon dynamics [8]. In line with this scientific understanding, China formally pledged at the 2020 UN General Assembly to achieve carbon peaking by 2030 and carbon neutrality by 2060. This commitment has heightened the demand for accurate assessments of ecosystem carbon dynamics [9].

Net ecosystem productivity (NEP) is defined as the residual component after subtracting soil microbial respiration (Rh) from net primary productivity (NPP), representing the net carbon accumulation in ecosystems [10]. Under undisturbed conditions without anthropogenic or natural disturbances, NEP serves as a robust quantitative indicator of terrestrial carbon sequestration capacity [11]. This parameter not only reflects the net carbon balance of ecosystems but also plays a pivotal role in regional carbon budget assessments. The spatiotemporal variability of NEP is co-regulated by multiple factors, including ecosystem degradation, extreme climate events, and anthropogenic activities, exhibiting significant disparities across different climatic zones and ecosystem types. Consequently, precise estimation of NEP and comprehensive analysis of its driving mechanisms are of paramount importance for addressing climate change and formulating scientifically sound carbon neutrality strategies.

Due to its high biodiversity and substantial carbon sink capacity, Yunnan Province serves as a key representative region in the study of regional carbon dynamics and ecosystem assessments. However, its complex topography, characterized by fragmented terrain and diverse vegetation types, leads to pronounced spatiotemporal heterogeneity in ecosystem processes. Combined with the high costs of field investigations and sparse ground observation networks, these factors severely constrain the application of traditional ground-based NEP estimation methods. Given these limitations, remote sensing technology—particularly satellite-based remote sensing—has gradually emerged as a crucial alternative approach due to its capability to acquire large-scale and temporally continuous ecological information [12,13].

As a core indicator of ecosystem carbon budgets, the spatiotemporal dynamics of NEP are comprehensively regulated by multiple environmental factors [14], which has been widely confirmed by extensive research. Chen et al. [15] demonstrated that NEP exhibits significant spatial heterogeneity and a temporal increasing trend across China. Its variation is primarily driven by the combined effects of precipitation change rate, solar radiation change rate, and elevation, with the interactive effect between precipitation change rate and elevation having the most significant impact on NEP variation (q-value reaching 0.29). Further, Qi et al. [16] indicated that climatic factors (precipitation, solar radiation) and land use/cover change jointly influence the spatial differentiation and interannual fluctuations of NEP in China’s terrestrial ecosystems. On the regional scale, research by He and Yuan [17] revealed that solar radiation, precipitation, surface bulk density, temperature, and surface clay content significantly affect NEP in Northwest China. Moreover, a global analysis by Liu et al. uncovered that anthropogenic activities contribute positively to global NEP by 36.33%, with their effect intensity being approximately 6.4 times that of climate change. Furthermore, climate change amplifies this promoting effect of anthropogenic activities in 44% of the regions. A deep understanding of the independent contributions of these environmental covariates to NEP and their complex interactions is fundamental for revealing regional carbon sink mechanisms and predicting future changes. While satellite remote sensing efficacy for NEP estimation is established, mountainous ecosystems like Yunnan’s pose distinct challenges due to extreme heterogeneity, complex terrain, and microclimates. Liang et al. [18] effectively mapped China’s terrestrial carbon fluxes by integrating MODIS data into an enhanced CASA model. Mao et al. [19] applied the Terrestrial Ecosystem Carbon-budget (InTEC) model in Zhejiang, revealing subtropical forest NEP growth and climate contributions. Sun et al. [20] improved BIOME-BGC accuracy for subtropical forests via calibration, highlighting the need for regional model adaptation. Studies in Guangxi and Guizhou utilized the MOD17A3HGF NPP product and the NPP-Rh approach to calculate NEP, thereby confirming significant anthropogenic alterations to regional carbon sinks [21,22]. Collectively, these studies confirm the complex, multi-driver (topography, climate, land use, and human activity) nature of subtropical mountain NEP dynamics, exhibiting significant regional variation. Despite existing research progress, critical knowledge gaps persist concerning the dynamics of NEP and its driving mechanisms within Yunnan’s complex subtropical mountain ecosystems. Systematic studies based on environmental–remote sensing fused features remain insufficient. Compounding these limitations, traditional process-based models face persistent challenges in such environments. Their reliance on dense site calibration conflicts with Yunnan’s data scarcity, and their inherent structure struggles to capture the pervasive nonlinear interactions and threshold effects among drivers [23]. Even advanced implementations, such as the improved CASA model, which incorporates satellite-derived parameters and sophisticated algorithms [24], remain limited in regional feasibility and in representing interaction driver interactions.

Each machine learning (ML) method possesses distinct advantages and limitations. For instance, artificial neural network (ANN) algorithms are conceptually straightforward but demand substantial training datasets and significant computational resources [25]. Conversely, Support Vector Regression (SVR) can produce high-quality outputs; however, it faces challenges such as kernel function selection difficulties and computationally intensive training processes [26]. Moreover, ecosystem processes are strongly nonlinear and driven by multiple environmental factors and their complex interactions. Therefore, the selection of models must pay particular attention to their capability to capture complex nonlinear relationships and interaction effects. In this context, tree-based models, such as Random Forest Regression (RFR), eXtreme Gradient Boosting Regression (XGBR), and Categorical Boosting (CatBoost), are inherently adept at capturing these nonlinearities and interactions through recursive data partitioning. They effectively capture spatial non-stationarity and heterogeneity without requiring a priori assumptions about the functional form [27]. Existing research robustly demonstrates the efficacy of these models in analyzing ecosystem carbon flux dynamics. For example, Liu et al. [28] demonstrated that both RFR and XGBR algorithms achieve high accuracy in predicting Net Ecosystem Exchange (NEE). Liu et al. [29] further showed that the CatBoost algorithm slightly outperformed XGBR in predicting national Gross Ecosystem Product (GEP). Li et al. [30] successfully applied XGBR to uncover climate–productivity interactions in the Amazon. Zheng et al. [31] utilized an RFR model to reveal the spatiotemporal patterns of grassland NEP and its driving factors from 1982 to 2018, providing robust methodological validation for improving site-scale NEP estimation. Leveraging their theoretical advantage in capturing the complex nonlinearities of ecosystems, these models offer a viable approach for accurate and interpretable estimation of NEP in complex terrains [32].

Recent research indicates that ML models processing complex datasets typically rely on high-dimensional input variables, particularly when incorporating diverse environmental covariates potentially influencing NEP. However, excessive or irrelevant features may introduce redundancy and noise, thereby impairing model performance. This highlights the importance of effective feature selection in improving model generalization and accuracy [33]. Minimizing the impact of redundant features can improve model performance [34]. Feature selection methods, such as Random Forest Regression (RFR) and Variable Selection Using Random Forests (VSURF), have been extensively validated across multiple domains for identifying key environmental drivers and have proven significantly effective in boosting model performance [35,36,37]. Li et al. [38] found that the application of variable selection methods can effectively identify key features that substantially contribute to estimation results, while removing redundant variables with limited predictive relevance, thereby enhancing both model performance and generalization capacity. Although ML approaches facilitate precise NEP estimation and are applied to investigate the mechanisms by which environmental variables influence NEP, their inherent ‘black box’ nature introduces a degree of uncertainty [39]. This limitation reduces the interpretability of model outputs and may obscure the driving mechanisms of critical ecological processes. To overcome this constraint, Lundberg and Lee [39] proposed the SHAP (SHapley Additive exPlanations) algorithm, which provides a novel approach to deciphering the “black-box” problem in ML algorithms. As a Shapley value-based model interpretation tool, the SHAP algorithm effectively quantifies the relative contribution of individual predictor variables to model outputs while elucidating complex interaction effects between various drivers and NEP [40]. Although the SHAP method demonstrates promise in enhancing model interpretability, the integration of machine learning predictions with SHAP-based interpretation to explore the combined effects of carbon sink drivers and their threshold responses remains insufficiently understood.

Given the aforementioned limitations and the need for improved representation of carbon dynamics complexity in mountainous regions, the main objectives of this study are to (1) establish an optimal NEP estimation model for Yunnan Province based on ML techniques, (2) investigate the impact of variable selection methods on the accuracy of NEP estimation in Yunnan Province, and (3) analyze carbon sink driving mechanisms using model interpretation approaches and quantify the contributions and interaction effects of various factors.

2. Materials and Methods

2.1. Study Area

Yunnan Province is located at the convergence of East Asia, Southeast Asia, and South Asia. With a total area of approximately 3.94 × 10⁵ km² (4.1% of China’s land area) [41], the province’s climatic regime is dominated by subtropical monsoon conditions, receiving approximately 1100 mm in yearly precipitation (concentrated during May–October) and mean annual temperatures of 15–20 °C. The topography is predominantly mountainous (88.6% coverage), featuring six major river systems: Nujiang, Pearl, Yuanjiang, Lancang, Yangtze, and Daying. Distinct vertical climate zones have formed due to pronounced altitudinal (decreasing northwest to southeast) and precipitation gradients [42]. Land use patterns are dominated by forest, with coexisting agricultural, urban, and unused lands [43]. This complex natural environment provides ideal conditions for investigating NEP spatiotemporal variations and their driving factors. The study area is illustrated in Figure 1.

2.2. Materials

2.2.1. Satellite Imagery Acquisition

This study employed the MOD09A1 8-day composite surface reflectance product at 500 m spatial resolution, obtained from the National Aeronautics and Space Administration (NASA) (https://ladsweb.modaps.eosdis.nasa.gov/, accessed 11 April 2025), to acquire high-quality surface reflectance data. The dataset spans 20 complete years (2003–2022) and provides complete coverage of Yunnan Province through MODIS standard grid tiles h27v06 and h26v06, where “h” and “v” represent horizontal and vertical grid numbering, respectively. Derived from Level 2G MOD09GHK data, the MOD09A1 product incorporates optimal pixel selection during each 8-day period based on highest observation quality scores, minimal view angles, and absence of aerosol contamination, while having undergone rigorous radiometric and geometric corrections [44]. Four characteristic spectral bands were selected as primary data sources: red (620–670 nm), green (545–565 nm), blue (459–479 nm), and near-infrared (841–876 nm) bands. Cloud-contaminated pixels were identified and removed using bitmask techniques applied to the quality assessment (QA) band embedded in the product. Temporal aggregation of the 8-day composite products yielded annual time-series datasets, with spatial resolution resampled to 1000 m through bilinear interpolation [45] to ensure spatiotemporal consistency of the final dataset.

2.2.2. Environmental Variable (EV) Acquisition

To systematically investigate the interaction mechanisms between NEP and environmental variables, this study integrated multi-source datasets (Table 1), encompassing key environmental parameters critical for NEP estimation. Detailed specifications of the dataset are provided in Table 1.

Considering data availability and spatiotemporal resolution, we selected five categories of environmental factors: topography, climate, soil, vegetation, and human activities (Table 1). Topographic data (SRTM Digital Elevation Model) and soil property data (0–30 cm surface layer) were obtained from the Google Earth Engine (GEE) cloud platform (https://earthengine.google.com/, accessed on 15 April 2025). Meteorological data (including temperature, precipitation, and potential evapotranspiration) for the period 2003–2022 were sourced from the National Tibetan Plateau Data Center (TPDC, https://data.tpdc.ac.cn/, accessed on 26 April 2025). Surface solar radiation data for the period 2003–2022 were sourced from the Geographic Data Sharing Infrastructure, global resources data cloud (www.gis5g.com, accessed on 26 April 2025). Nighttime light data for the period 2003–2022 and population density data were also acquired from the Geographic Data Sharing Infrastructure, a global resources data cloud (www.gis5g.com, accessed on 26 April 2025). Notably, since the population density dataset covers the period 2000–2020, the 2020 population density data were used to fill gaps for the years 2021 and 2022. Additionally, annual land cover type data from 2003 to 2022 were derived from the MODIS product (MCD12Q1, https://lpdaac.usgs.gov/products/mcd12q1v006/, accessed on 20 April 2025) provided by NASA. This product adopts the International Geosphere-Biosphere Programme (IGBP) 17-category classification scheme, encompassing major land cover types such as forests, croplands, and urban areas (Table 2). For the remaining 16 vegetation types (excluding water bodies), the data were processed using one-hot encoding. In this method, each category is converted into a binary feature column (0 or 1), indicating its presence or absence [46]. These 16 encoded features were then incorporated into the machine learning model as input variables, along with continuous environmental factors. Due to model parameter constraints, water bodies were excluded from the land cover classification in the model calculations. To ensure spatial consistency, all input variables were resampled to a uniform 1000 m spatial resolution using bilinear interpolation [45].

2.2.3. NEP Data Acquisition

To systematically investigate the driving mechanisms of carbon sinks in Yunnan Province, this study utilized long-term (2003–2022) NPP data derived from NASA’s MOD17A3HGF dataset. This dataset provides annual cumulative NPP products at a 500 m spatial resolution, which were resampled to 1000 m using bilinear interpolation to match the spatial scale of other environmental factors.

For Rh estimation, we adopted the temperature–precipitation nonlinear model developed by Pei et al. [47]. The applicability of this model has been validated at the national scale [48], in the southwestern region of China [49], and in the Guangxi region [50], which exhibits ecological characteristics similar to those of Yunnan. These validation areas exhibit partial spatial overlap or ecological similarities with the current study area, suggesting its applicability within our research domain. In this study, NEP was calculated using the following equation:

NEP = NPP − Rh

(1)

R h (x, t) = 0.22 \times [\exp \{0.0913 T (x, t)\} + \ln \{0.3145 P (x, t) + 1\}] \times 30 \times 46.5 %

(2)

Rh denotes soil heterotrophic respiration, while

T

and

P

represent the mean temperature (°C) and total precipitation (mm), respectively, for pixel

x

during month

t

. In the equation, when NEP > 0, it indicates that the ecosystem functions as a carbon sink, where carbon fixation exceeds carbon release. Conversely, NEP < 0 represents a carbon source.

2.2.4. Remote Sensing Variable (RSV) Acquisition

This study extracted four spectral bands (red, green, blue, and near-infrared) and calculated 17 vegetation indices (Table 3) based on surface reflectance data from the MOD09A1 product.

2.3. Methods

To construct a high-accuracy and interpretable regional-scale NEP estimation model, spectral bands and vegetation indices were first derived from MODIS products, while environmental variables were obtained from relevant geospatial datasets. Feature importance screening was conducted using RFR and VSURF. Three machine learning regression models—RFR, XGBR, and CatBoost—were then evaluated to identify the optimal NEP model. Finally, SHAP was applied to the optimal model to interpret the nonlinear effects and interactions among key influencing factors. The complete workflow is illustrated in Figure 2.

2.3.1. Variable Selection Based on Feature Importance

To identify the key driving factors influencing NEP, this study employed a feature selection approach combining the RFR and VSURF methods. The RFR-based importance screening method quantifies the contribution of each feature to the target variable by calculating its importance score within the model framework [64]. In contrast to conventional RFR approaches that require manual variable selection based on importance ranking, the VSURF algorithm implements a three-stage screening mechanism (noise removal–refinement–optimization) to automatically identify the optimal feature combination [65], thereby effectively reducing prediction errors and enhancing estimation accuracy.

The integration of VSURF with RFR-based importance screening enables the identification of the most critical variables for NEP estimation from a large pool of input features, simultaneously reducing model complexity and improving predictive performance. The optimal feature subset output by this algorithm serves as a uniform input applied to all subsequent machine learning models. The study evaluated the feature selection capabilities of RFR and VSURF across three distinct datasets—21 RSV, 14 EV, and their combined set—aiming to optimize the NEP estimation model for Yunnan Province.Regression techniques.

2.3.2. Random Forest Regression

Random Forest Regression (RFR) operates by generating and synthesizing predictions from multiple decision trees, achieving superior accuracy and robustness compared to single-tree approaches [64]. This approach demonstrates strong capability in handling high-dimensional data while maintaining considerable tolerance to noise and missing values [66]. In the present study, the RFR model was configured with 200 trees and a maximum depth of 64, with the random seed fixed at 42 to ensure reproducibility of the results.

2.3.3. Extreme Gradient Boosting Regression

Extreme Gradient Boosting Regression (XGBR) is a gradient boosting-based ensemble algorithm that minimizes the loss function through iterative optimization of decision trees [67]. The algorithm incorporates regularization techniques and second-order gradient approximation, effectively preventing overfitting while maintaining high predictive accuracy. In this study, the model parameters were configured with 1000 iterations, a learning rate of 0.1, and a maximum tree depth of 10.

2.3.4. Categorical Boosting Regression

Categorical Boosting (CatBoost) regression implements gradient boosting with inherent categorical feature support. Through additive modeling, where successive decision trees correct preceding prediction errors, the algorithm constructs high-performance ensembles for complex nonlinear relationship detection [68]. This makes it particularly suitable for addressing complex regression problems. CatBoost can directly process categorical features without preprocessing, which helps prevent overfitting and data leakage. Additionally, CatBoost employs various regularization techniques, such as learning rate adjustment and tree pruning, to enhance the model’s generalization capability. In this study, the parameters were set as follows: number of iterations = 1000, learning rate = 0.1, and maximum tree depth = 10.

2.4. SHapley Additive exPlanations (SHAP)

The SHAP algorithm, introduced by Lundberg and Lee [69], is a unified approach for interpreting ML model outputs through computing feature-specific attribution scores for model predictions [70]. It represents the model’s output as the sum of Shapley values—real-valued attributions assigned to each input feature—thereby providing insights into how input features influence model predictions. This study employs SHAP’s TreeExplainer, which leverages the intrinsic structural properties of tree-based models to enable computationally efficient and mathematically exact Shapley value calculations. This is particularly critical for understanding model behavior, especially in complex models where the relationship between inputs and outputs is nonlinear or intricate. The core of SHAP analysis lies in the computation of Shapley values for each feature, implemented through the following key equation:

φ_{i} = \sum_{S \subseteq N ∖ \{i\}} \frac{|S|! (|N| - |S| - 1)!}{|N|!} [f_{x} (S \cup \{i\}) - f_{x} (S)]

(3)

The equation above represents the Shapley value

φ_{i}

for the

\{i\}

feature, where N denotes the set of all features, and S is a subset of N excluding the

\{i\}

feature. The terms

f_{x} (S \cup \{i\})

and

f_{x} (S)

correspond to the model’s predictions with and without the

\{i\}

feature, respectively. The Shapley value quantifies the marginal contribution of the

\{i\}

feature by considering all possible subsets of features. A positive SHAP value indicates that the corresponding feature contributes positively to NEP estimation, while a negative SHAP value suggests a negative influence on NEP estimation.

2.5. Accuracy Analysis

To enhance the validity and robustness of the model estimation, this study collected a total of 182,040 sample points from the study area spanning 2003 to 2022. A stratified random sampling method was employed to split the dataset into training and testing sets at a 7:3 ratio. To ensure sufficient training samples and validation accuracy [71], a k-fold cross-validation (k = 10) approach was applied to evaluate the model’s generalization performance [72]. All modeling procedures were implemented in Python 3.9, and model performance was assessed using three metrics: the coefficient of determination (R²), root mean square error (RMSE), and mean absolute error (MAE). The equations for the model evaluation metrics are as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y_{i}})}^{2}}

(4)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{n - 1}}

(5)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}|

(6)

where

y_{i}

is the measured value,

\hat{y_{i}}

is the predicted value,

\bar{y_{i}}

is the sample mean, and n is the sample size.

3. Results

3.1. Machine Learning-Based NEP Estimation

This study evaluated the performance of three ML models (RFR, CatBoost, and XGBR) using three feature input types: RSV, EV, and their combination (RSV-EV), as shown in Table 4. As shown in Figure 3, we comprehensively validated the accuracy of the different input variable combinations across the ML models. Compared to single-source variables, the RSV-EV combination demonstrated superior fitting performance in all three models. Among the ML models, XGBR achieved the highest evaluation metrics (R² = 0.89, RMSE = 103.97 gC/(m²·a), MAE = 77.28 gC/(m²·a)), indicating its robustness and reliability in this study and confirming it as the optimal model.

3.2. NEP Estimation Model Based on Feature Selection

3.2.1. Accuracy Assessment of Variable-Selected Models Based on Importance Analysis

To mitigate noise interference caused by excessive variables in the model, we employed an RFR-based importance analysis for variable selection across three distinct variable sets (RSV, EV, and RSV-EV) (Figure 4). The results showed that the EVI feature from RSV exhibited the highest importance score of 0.63. Following this analysis, features with a cumulative importance score exceeding 0.95 were retained as inputs for subsequent model development. The incorporation of these selected features resulted in measurable improvements in predictive performance for certain machine learning models, as quantitatively validated by the evaluation metrics presented in Table 5. Figure 5 demonstrates the fitting relationships between the predicted and measured values using the RSV-EV variable set across different models. When employing the RSV-EV variable combination, the RFR model exhibited optimal performance, with the R² increasing from 0.87 to 0.89, while the RMSE decreased to 103.22 gC/(m²·a) and the MAE reduced to 73.62 gC/(m²·a).

3.2.2. Accuracy Evaluation of VSURF-Based Feature-Selected Models

To assess the performance of the VSURF-based variable selection approach, we implemented VSURF to filter three different variable groups: RSV, EV, and RSV-EV (Table 6). We then built three ML-based NEP estimation models employing the VSURF-filtered variables. The application of VSURF-selected features significantly enhanced the predictive performance of all three machine learning models, with the improvement quantitatively confirmed by the evaluation metrics shown in Table 7. The feature selection results based on the VSURF indicate that the single-variable sets (RSV and EV) failed to significantly improve model performance, whereas the combined variable set (RSV-EV) significantly enhanced the predictive performance of all three machine learning models. This outcome has been validated by the evaluation metrics presented in Table 7. Among them, the XGBR model trained with the combined RSV-EV demonstrated optimal NEP estimation performance, achieving an R² of 0.94, RMSE of 76.82 gC/(m²·a), and MAE of 55.11 gC/(m²·a), all of which were significantly superior to the baseline model (Figure 6).

3.3. SHAP Interpretation of XGBR Model

Figure 7 provides a comprehensive SHAP-based interpretation of the relative importance and directional influence of variables on NEP estimation. In Figure 7a, the mean absolute SHAP values quantify the overall contribution of each variable to the model output, with longer bars indicating greater importance. Among all features, EVI exhibits the highest mean SHAP value—surpassing other variables by a factor of 3 to 10—underscoring its critical role in driving NEP variability. Figure 7b further illustrates the directionality of feature effects, with dot colors representing feature values (red for high, blue for low). A positive relationship is indicated when higher values (red) correspond to positive SHAP values and lower values (blue) correspond to negative SHAP values. Collectively, variables accounting for 95% of cumulative importance—including EVI, SolRad, Temp, SC, Precip, and SS—exert a predominantly positive influence on NEP, whereas soil pH displays a negative association.

3.3.1. SHAP-Based Single-Factor Impact Analysis

SHAP dependence plots were employed to quantitatively characterize the effects of individual variables on NEP (Figure 8). The EVI exhibited a pronounced positive relationship with NEP (Figure 8a), suggesting that higher EVI values are associated with increased net ecosystem production. The influence of EVI on NEP exhibits a distinct threshold effect: when EVI values are below 0.3, their promoting effect on NEP is relatively weak; when EVI exceeds this critical threshold of 0.3, its positive effect on NEP becomes markedly stronger; and when EVI values surpass 0.4, this enhancement gradually approaches saturation. Different vegetation types demonstrate significant divergence in their impacts on NEP (Figure 8b). Forest-dominated vegetation types predominantly show SHAP values in the positive range, indicating their substantial positive driving effect on NEP through efficient photosynthesis and complex ecosystem carbon cycling processes. In contrast, land cover types such as croplands, urban areas, and snow/ice-covered zones exhibit SHAP values approaching zero or negative values, reflecting their limited positive contribution to NEP due to low vegetation coverage and restricted ecological activities, or even inhibitory effects resulting from inadequate carbon sequestration capacity.

The influence of soil physicochemical properties on NEP exhibits significant nonlinear characteristics and threshold effects. Regarding pH, the results reveal distinct acidity–alkalinity thresholds: when pH falls within the weakly acidic range of 3.69–5.65, SHAP values are predominantly positive (Figure 8c). However, as pH increases to the neutral to weakly alkaline range of 5.65–8.04, SHAP values begin to show alternating positive and negative distributions, with significantly increased frequency and absolute magnitude of negative values, indicating that enhanced soil alkalinity can impair ecosystem carbon sequestration capacity. Concerning soil texture, different particle size fractions demonstrate markedly distinct mechanisms of influence on NEP. The effects of SC and SS content exhibit typical “dose–response” relationships (Figure 8d,e). For SC, in the lower content range (13–28.18%), SHAP values display scattered distributions mostly in negative territory, suggesting inhibitory effects on carbon cycling processes. When content increases to the medium–high range (28.18–44%), SHAP values become significantly concentrated in positive regions with enhanced magnitudes. In contrast, SS demonstrates a completely different pattern: within the 28.07–44.23% content range, its influence on NEP is primarily positive, but when content exceeds 44.23%, SHAP values distinctly shift toward negative ranges with increasing absolute magnitudes.

The influence of climatic variables on NEP is highly complex. Three key climatic factors—Temp, SolRad, and Precip—exhibit distinct critical thresholds in their effects on NEP (Figure 8f–h). The results demonstrate that SHAP values predominantly exhibit negative responses when temperature falls within the low-temperature range of −6.52 to 6.56 °C or exceeds 23.79 °C, solar radiation ranges between 2196.02 and 3784.35 MJ/(m²·a), or annual precipitation deviates from the optimal range of 1120.09–1834.96 mm. As environmental conditions improve, SHAP values transition to positive values when Temp rises to the moderate range of 6.56–23.79 °C, SolRad exceeds 3784.35 MJ/(m²·a), or Precip remains within the favorable 1120.09–1834.96 mm range. Furthermore, complex interaction effects among climatic factors are manifested through discrete distribution patterns of SHAP values. This reflects the sensitivity of ecosystem carbon cycling processes to coupled multi-factor interactions and underscores the inherent complexity of climate–vegetation–soil system dynamics.

3.3.2. SHAP-Based Analysis of Feature Interactions

In addition to analyzing individual factor effects on NEP, this study investigated the interactive influences between paired factors. Figure 9 demonstrates the impact of interactions between the EVI and other variables on the model estimates. The color gradient visually represents variations in this relationship when constrained by the different thresholds of a secondary variable. The results indicate that interactions between EVI and other factors significantly influenced the EVI’s contribution to NEP.

The nonlinear interactive responses among the different factors exhibited distinct threshold effects (Figure 10a–g). The combination of Temp and Precip demonstrates highly significant interaction effects (Figure 10a), while Precip and soil factors also showed interactive influences on NEP (Figure 10c,d). Particularly noteworthy is the formation of two characteristic coupled-response intervals between Temp and soil texture factors: Under low-Temp conditions (−5 to 12.15 °C), moderate SC content (13–25%) combined with high SS content (40–55%) exhibits significantly reduced carbon sequestration capacity. Conversely, in the moderate to high Temp range (5–23.79 °C), systems with higher SC content (25–40%) and lower SS content (25–43%) demonstrate enhanced positive interaction effects (Figure 10e,f). This thermal-dependent soil texture effect is further corroborated by the observed negative correlation between SC and SS content (Figure 10b), revealing the complex interplay between soil composition and temperature in regulating carbon cycling processes. Forest and shrub ecosystems exhibit significant positive responses to temperature, indicating that vegetation type (VegType) plays a key role in modulating climate sensitivity. In contrast, ecosystems characterized by low vegetation cover display markedly different response patterns (Figure 10g).

4. Discussion

4.1. Performance of ML-Based Algorithms

This study evaluated the performance of three machine learning algorithms—RFR, XGBR, and CatBoost—in estimating NEP using a combination of multiple predictive factors. The results demonstrated that all three models achieved high predictive accuracy (R² > 0.87) in NEP estimation. Among them, the XGBR model exhibited the strongest robustness, which aligns with the findings of Xing et al. [73] in a subtropical evergreen forest ecosystem in southern China, where XGBR showed superior performance in modeling gross primary productivity and evapotranspiration. As an enhanced variant of gradient boosting algorithms, XGBR incorporates regularization terms and second-order gradient optimization, significantly improving its ability to manage model complexity and mitigate overfitting [74]. However, Schratz et al. [75] observed that, under specific conditions, the performance of the random forest algorithm could surpass that of the XGBR-based approach. In this study, RFR and CatBoost exhibited slightly lower estimation accuracy compared to XGBR. This discrepancy may be attributed to the pronounced spatial heterogeneity within the mountainous ecosystem of Yunnan. The steep topography drives dramatic short-distance variations in environmental factors, encompassing diverse vegetation types, from low-altitude tropical monsoon rainforests to high-altitude cold-temperate coniferous forests. This rich diversity results in significant canopy structural differences and the prevalence of transition zones and fragmented landscape patches. XGBR, through its gradient boosting framework and tree-based architecture, excels at capturing highly nonlinear relationships and complex interactions among features [76]. Crucially, its built-in regularization terms effectively control model complexity, reducing the risk of overfitting to noisy data or sparse regions—a capability particularly vital for handling the spatially heterogeneous, noisy, and patchy landscape data characteristic of Yunnan’s mountains [77]. To explain this phenomenon, Grêt-Regamey et al. [78] proposed the ecosystem specificity hypothesis, which posits that the observed performance divergence among algorithms is fundamentally associated with ecosystem-type heterogeneity and spatial-scale effects. Building upon these findings, Li et al. [79] emphasized that algorithm optimization should be based on specific application requirements in practical implementations.

4.2. Variable Selection for NEP

Although the features extracted from optical images can improve the estimation accuracy of the model, introducing excessive variables may lead to noise interference, thereby reducing the model’s predictive performance [34,80]. This phenomenon has been corroborated in related studies. For instance, Li et al. [66] estimated forest aboveground biomass (AGB) using Landsat-8 OLI imagery and found that reducing the number of variables through variable selection methods improved the accuracy of Linear Regression (LR), RFR, and XGBR models. Notably, the improvement for XGBR was significantly greater than for RFR, aligning with the differential impact of variable selection on model performance observed in this study. Therefore, this study employs the RFR and VSURF variable selection methods to screen RSV (21), EV (14), and RSV-EV (35). Consistent with the findings of [81], VSURF identified the fewest key variables, retaining RSV (7), EV (7), and RSV-EV (11) after screening. The efficacy of VSURF in variable reduction is also supported by broader research. Speiser et al. [36], utilizing 311 public classification datasets (averaging 22 candidate variables each), reduced variables to an average of 3.4 using VSURF while maintaining good model performance on holdout data (average out-of-bag (OOB) error rate of 15.6%, AUC 0.881), highlighting the method’s high parsimony while preserving accuracy. Similarly, Jiang et al. [82] found that VSURF retained only 3 variables, yet achieved superior estimation results compared to the 13 variables selected by RFR’s own variable importance measure. However, the optimality and effectiveness of variable selection methods can be highly dependent on the specific research objectives, dataset characteristics, and machine learning model employed. The findings of Luo et al. [83] in forest AGB estimation diverge from the effectiveness of the selection strategy observed here: they compared RFR and VSURF and found that RFR, selecting 5–25 key variables from 53 candidates and combined with the CatBoost model, achieved the best accuracy (minimum RMSE of 22.62 Mg/ha), significantly outperforming VSURF’s selection. This discrepancy may stem from differences in dominant variables and inter-variable relationship patterns arising from divergent research objectives, the effect of model combination, and inherent properties of the candidate variable sets (e.g., redundancy, noise level, correlation structure with the target variable).

Furthermore, compared to models built using unscreened RSV-EV, inputting VSURF-screened RSV-EV (11) into three machine learning models improved estimation accuracy (R² increased by 5%). This significant accuracy gain contrasts sharply with the 1% decrease in accuracy reported by Tan et al. [33] following a similar procedure. Such differences may be attributed to the influence of ecosystem and data specificity within the study region, as well as variations in the types and parameter settings of the machine learning models used. This improvement can be attributed, on the one hand, to VSURF’s ability to effectively identify the most influential variables through its two-stage selection strategy and, on the other hand, to the optimized model’s enhanced capacity to capture complex, nonlinear interactions among multiple factors, thereby refining NEP estimation precision [65].

4.3. The Driving Mechanisms of NEP and Their Ecological Implications as Revealed by the SHAP Framework

Our comprehensive SHAP-based analysis of the factors influencing NEP revealed that the EVI demonstrated the highest contribution to NEP estimation, indicating its dominant role in modeling NEP across Yunnan Province. Specifically, within Yunnan’s unique subtropical mountain environment characterized by tall, dense forests coexisting with complex soil backgrounds, the EVI outperforms other indices like the NDVI due to its superior resistance to saturation and noise suppression. This capability allows the EVI to sensitively capture critical changes in canopy structure, phenology, and stress, thereby providing an irreplaceable vegetation signal for accurate regional NEP estimation [52,84]. This finding is consistent with the results reported by Shi et al. [85], demonstrating that the annually integrated EVI provides a remarkably simple yet robust approach for estimating global annual GPP spatial patterns, with performance comparable to various light use efficiency models and data-driven models. Our study reveals a distinct inflection point in the impact of the EVI on NEP at 0.1, characterized by a progressive attenuation of its suppressive effect and a trend towards positive influence. This culminates in a transition from suppression to promotion at the critical threshold (approximately 0.3). This is similar to the findings of Wang et al. [86]: at the regional scale, the EVI transition threshold remains stable at 0.137 in areas of natural vegetation expansion, while the maximum threshold can reach 0.223 in quality improvement zones. This threshold not only quantitatively links vegetation index changes to significant shifts in core carbon cycle processes but also represents a critical tipping point for nonlinear transitions in ecosystem functional states, serving as a key biophysical indicator for identifying carbon source/sink transitions and stress precursors. When the EVI falls below this threshold, it typically indicates sparse vegetation cover, underdeveloped canopy structure, or vegetation under stress (e.g., drought, pest infestation), leading to insufficient photosynthetically active radiation interception and low photosynthetic efficiency, thereby suppressing NEP. Conversely, when the EVI exceeds this threshold, it signifies high canopy closure, a large leaf area index, and vigorous vegetation growth, significantly enhancing NEP [87]. This finding quantitatively links abstract vegetation index variations to significant shifts in core carbon cycle processes, providing a measurable basis for understanding how vegetation dynamics drive carbon sink functionality. Moreover, it offers spatially explicit targets for enhancing carbon sequestration potential through forest conservation, vegetation restoration, and other mitigation measures, as well as for optimizing carbon cycle models. Forest-dominated vegetation types exerted strong positive driving effects on NEP through efficient photosynthetic activity and complex ecosystem carbon cycling processes. In contrast, land cover types such as croplands, urban areas, and snow/ice-covered zones showed weaker positive contributions or even inhibitory effects on NEP due to limited vegetation coverage and constrained ecological activities [88]. This demonstrates that long-term NEP trends are primarily governed by ecosystem structural dynamics, particularly forest growth and degradation processes.

The critical variable combinations demonstrated significant bidirectional effects and distinct threshold responses on NEP. Both SC and SS exhibited characteristic “dose–response” relationships with NEP. Moderate SC levels promote NEP by improving soil water retention capacity and nutrient preservation, thereby creating optimal conditions for vegetation growth and carbon fixation [89]. This reveals an optimal synergy between physical properties and biogeochemical processes within this range, thereby maximizing productivity and carbon sequestration potential [90]. Conversely, excessive SS content negatively impacts ecosystem carbon sink function through reduced soil water-holding capacity and diminished nutrient availability, ultimately constraining regional carbon sequestration potential [91]. The pH also exhibits a critical threshold. A weakly acidic environment (pH 3.69–5.65) generally enhances the availability of certain nutrients (e.g., iron, manganese) and microbial activity, thereby exerting a positive effect on NEP [92]. This pH threshold elucidates how hydrogen ion concentration profoundly influences the net balance between soil carbon stabilization and mineralization by regulating element solubility, enzyme activity, and the equilibrium of microbial functional groups [93].

Notable interaction effects were identified both among climatic factors and between climatic and soil factors, which aligns with findings from Zhang et al. [94] in terrestrial ecosystems across China. In general, elevated temperature, increased solar radiation, or precipitation within optimal ranges positively influenced NEP. This can be primarily attributed to favorable climatic conditions enhancing critical ecological processes, including vegetation photosynthetic carbon fixation, soil organic matter decomposition, and nutrient cycling. Prior to reaching the temperature optimum for photosynthesis, warming stimulates ecosystem productivity by accelerating the release and bioavailability of soil nutrients, thereby promoting carbon sequestration [95]. This reflects the optimal efficiency of enzymatic reactions and physiological processes at specific temperature ranges. Beyond this range, elevated temperatures intensify respiratory consumption, damage photosynthetic apparatus, and suppress microbial activity, ultimately leading to reduced NEP [96]. Insufficient solar radiation directly limits the energy supply for photosynthesis, thereby suppressing carbon assimilation; conversely, excessive radiation under specific conditions may induce photoinhibition [97]. Precipitation mediates NEP dynamics by modifying incoming solar radiation through cloud cover alterations and by affecting soil properties [98]. Precipitation deviation from the optimal range may induce water stress that limits photosynthesis and plant growth, or alternatively trigger waterlogging that reduces soil aeration, promotes nutrient loss, and enhances ecosystem respiration, collectively leading to reduced NEP [99]. Furthermore, the divergent response patterns between Temp and soil texture factors reflect ecosystem adaptation strategies to combined climate–soil stress conditions [100,101]. These findings reveal the nonlinear interactions and critical thresholds of state transitions in ecosystem responses to multiple environmental drivers, which are crucial for predicting carbon sink vulnerability under the combined effects of climate change and soil degradation. The interaction effect of low temperature, high SS, and moderate SC significantly suppresses carbon sink capacity, whereas the combination of optimal temperature, lower SS, and higher SC enhances carbon sequestration [102]. The negative correlation between SC and SS content provides further evidence for the complex interactions between soil texture and temperature in regulating carbon cycle processes.

4.4. Limitations and Future Perspectives

The Feature Selection–Machine Learning–Mechanism Interpretation framework proposed in this study has achieved significant progress in identifying the drivers of carbon sinks in Yunnan Province, but still has limitations in depicting the spatial details of the driving variables and conducting comprehensive analyses of ecological processes. The spatial resolution of driver variables may inadequately capture small-scale heterogeneity. Future studies could address this by integrating multi-scale remote sensing data (e.g., Sentinel-2 MSI and GF-7 PMS imagery with resolutions as fine as 0.8 m) combined with object-based image analysis (OBIA) techniques. Furthermore, the analysis did not fully account for lagged climate effects, extreme climate events, or the comprehensive impacts of anthropogenic management practices [49]. To resolve this, we propose developing a temporal convolutional network (TCN) module to capture legacy effects and establishing a disturbance database incorporating historical management records. Moreover, the model framework of this study has not yet fully integrated the complex biogeochemical cycling processes within regional ecosystems, such as the feedback mechanisms of soil microbial community structure changes on the carbon cycle and the dynamic responses of vegetation physiological processes to environmental stressors [103]. These processes play a crucial role in the ecosystem carbon cycle, but their complexity and nonlinearity are challenging to fully capture [104]. Future research should explore hybrid frameworks that integrate process-based models (incorporating key environmental features) with higher spatial resolution remote sensing and machine learning to better analyze spatial heterogeneity and improve mechanistic understanding.

5. Conclusions

This study systematically investigated the driving mechanisms of NEP in Yunnan Province by integrating multi-source datasets with three machine learning regression algorithms (RFR, XGBR, and CatBoost), combined with variable selection methods and SHAP interpretability analysis. The results demonstrate that the selected subset of features exhibited excellent estimation accuracy across all three machine learning models (coefficient of determination, R² > 0.90). Additionally, the models exhibited low prediction errors, with RMSE values of 97.51, 76.82, and 89.42 gC/(m²·a) for RFR, XGBR, and CatBoost, respectively, and MAE values of 69.61, 55.11, and 65.52 gC/(m²·a), respectively. Interpretability analysis based on the SHAP method further revealed that the EVI, climatic factors, and soil parameters were the key drivers influencing NEP. Compared to conventional models, this approach not only effectively identifies key driving factors but also quantitatively evaluates the nonlinear contributions and interaction effects of each variable. Moreover, optimized feature selection significantly improves the NEP estimation accuracy of machine learning models, increasing R² by 3–5% compared to models using unfiltered variables. The developed integrated framework—‘Feature Selection–Machine Learning–Mechanistic Interpretation’—provides a transferable analytical paradigm for deciphering the driving mechanisms of complex ecosystem carbon cycle processes.

Author Contributions

Conceptualization, H.T., J.Z. and K.Y.; data curation, L.X.; investigation, L.X.; methodology, H.T.; software, C.T.; supervision, J.Z.; validation, K.H.; writing—original draft, L.X.; writing—review and editing, H.T., J.Z., J.Y. and T.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Yunnan Fundamental Research Projects (No. 202501AS070047), the National Natural Science Foundation of China (No. 32260390), and Forestry Innovation Programs of Southwest Forestry University (No. LXXK-2023Z06).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bellard, C.; Marino, C.; Courchamp, F. Ranking threats to biodiversity and why it doesn’t matter. Nat. Commun. 2022, 13, 2616. [Google Scholar] [CrossRef] [PubMed]
Li, G.; Fang, C.; Li, Y.; Wang, Z.; Sun, S.; He, S.; Qi, W.; Bao, C.; Ma, H.; Fan, Y.; et al. Global impacts of future urban expansion on terrestrial vertebrate diversity. Nat. Commun. 2022, 13, 1628. [Google Scholar] [CrossRef]
Kantzas, E.P.; Val Martin, M.; Lomas, M.R.; Eufrasio, R.M.; Renforth, P.; Lewis, A.L.; Taylor, L.L.; Mecure, J.-F.; Pollitt, H.; Vercoulen, P.V.; et al. Substantial carbon drawdown potential from enhanced rock weathering in the United Kingdom. Nat. Geosci. 2022, 15, 382–389. [Google Scholar] [CrossRef]
Cao, M.; Prince, S.D.; Li, K.; Tao, B.; Small, J.; Shao, X. Response of terrestrial carbon uptake to climate interannual variability in China. Glob. Change Biol. 2003, 9, 536–546. [Google Scholar] [CrossRef]
Canadell, J.G.; Pataki, D.E.; Gifford, R.; Houghton, R.A.; Luo, Y.; Raupach, M.R.; Smith, P.; Steffen, W. Saturation of the terrestrial carbon sink. In Terrestrial Ecosystems in a Changing World; Springer: Berlin/Heidelberg, Germany, 2007; pp. 59–78. [Google Scholar]
Pan, Y.; Birdsey, R.A.; Fang, J.; Houghton, R.; Kauppi, P.E.; Kurz, W.A.; Phillips, O.L.; Shvidenko, A.; Lewis, S.L.; Canadell, J.G. A large and persistent carbon sink in the world’s forests. Science 2011, 333, 988–993. [Google Scholar] [CrossRef]
Piao, S.; He, Y.; Wang, X.; Chen, F. Estimation of China’s terrestrial ecosystem carbon sink: Methods, progress and prospects. Sci. China Earth Sci. 2022, 65, 641–651. [Google Scholar] [CrossRef]
He, H.; Wang, S.; Zhang, L.; Wang, J.; Ren, X.; Zhou, L.; Piao, S.; Yan, H.; Ju, W.; Gu, F. Altered trends in carbon uptake in China’s terrestrial ecosystems under the enhanced summer monsoon and warming hiatus. Natl. Sci. Rev. 2019, 6, 505–514. [Google Scholar] [CrossRef]
Wang, F.; Harindintwali, J.D.; Yuan, Z.; Wang, M.; Wang, F.; Li, S.; Yin, Z.; Huang, L.; Fu, Y.; Li, L. Technologies and perspectives for achieving carbon neutrality. Innovation 2021, 2, 100180. [Google Scholar] [CrossRef]
Woodwell, G.M.; Whittaker, R.; Reiners, W.; Likens, G.E.; Delwiche, C.; Botkin, D. The Biota and the World Carbon Budget: The terrestrial biomass appears to be a net source of carbon dioxide for the atmosphere. Science 1978, 199, 141–146. [Google Scholar] [CrossRef] [PubMed]
Yu, G.; Chen, Z.; Piao, S.; Peng, C.; Ciais, P.; Wang, Q.; Li, X.; Zhu, X. High carbon dioxide uptake by subtropical forest ecosystems in the East Asian monsoon region. Proc. Natl. Acad. Sci. USA 2014, 111, 4910–4915. [Google Scholar] [CrossRef] [PubMed]
Balsamo, G.; Agustì-Parareda, A.; Albergel, C.; Arduini, G.; Beljaars, A.; Bidlot, J.; Blyth, E.; Bousserez, N.; Boussetta, S.; Brown, A. Satellite and in situ observations for advancing global Earth surface modelling: A review. Remote Sens. 2018, 10, 2038. [Google Scholar] [CrossRef]
Reddy, G.O. Satellite remote sensing sensors: Principles and applications. In Geospatial Technologies in Land Resources Mapping, Monitoring and Management; Springer : Cham, Switzerland, 2018; Volume 21, pp. 21–43. [Google Scholar]
Chen, H.; Wang, B.; Zheng, L.; Shahtahmassebi, Z. Quantifying Impacts of Regional Multiple Factors on Spatiotemporal the Mechanisms for Spatio-temporal changes of Net Primary Vegetation Productivity and Net Ecosystem Productivity: An Example in the Jianghuai River Basin, China. Photogramm. Eng. Remote Sens. 2023, 89, 761–771. [Google Scholar] [CrossRef]
Chen, Y.; Xu, Y.; Chen, T.; Zhang, F.; Zhu, S. Exploring the spatiotemporal dynamics and driving factors of net ecosystem productivity in China from 1982 to 2020. Remote Sens. 2023, 16, 60. [Google Scholar] [CrossRef]
Qi, S.; Zhang, H.; Zhang, M. Evolutionary characteristics of carbon sources/sinks in Chinese terrestrial ecosystems regarding to temporal effects and geographical partitioning. Ecol. Indic. 2024, 160, 111923. [Google Scholar] [CrossRef]
He, Z.; Yuan, W. Exploring the Influencing Factors of Net Ecosystem Productivity (NEP) Based on Random Forest and SHAP. Acad. J. Sci. Technol. 2024, 12, 242–248. [Google Scholar] [CrossRef]
Liang, L.; Geng, D.; Yan, J.; Qiu, S.; Shi, Y.; Wang, S.; Wang, L.; Zhang, L.; Kang, J. Remote Sensing Estimation and Spatiotemporal Pattern Analysis of Terrestrial Net Ecosystem Productivity in China. Remote Sens. 2022, 14, 1902. [Google Scholar] [CrossRef]
Mao, F.; Du, H.; Zhou, G.; Zheng, J.; Li, X.; Xu, Y.; Huang, Z.; Yin, S. Simulated net ecosystem productivity of subtropical forests and its response to climate change in Zhejiang Province, China. Sci. Total Environ. 2022, 838, 155993. [Google Scholar] [CrossRef]
Sun, J.; Mao, F.; Du, H.; Li, X.; Xu, C.; Zheng, Z.; Teng, X.; Ye, F.; Yang, N.; Huang, Z. Improving the Simulation Accuracy of the Net Ecosystem Productivity of Subtropical Forests in China: Sensitivity Analysis and Parameter Calibration Based on the BIOME-BGC Model. Forests 2024, 15, 552. [Google Scholar] [CrossRef]
Wang, L.; Wu, X.; Guo, J.; Zhou, J.; He, L. Spatial–temporal pattern of vegetation carbon sequestration and its response to rocky desertification control measures in a karst area, in Guangxi Province, China. Land Degrad. Dev. 2023, 34, 665–681. [Google Scholar] [CrossRef]
Feng, Q.; Zhou, Z.; Chen, Q.; Zhu, C.; Zhu, M.; Luo, W.; Wang, J. Quantifying the extent of ecological impact from China’s poverty alleviation relocation program: A case study in Guizhou Province. J. Clean. Prod. 2024, 444, 141274. [Google Scholar] [CrossRef]
Qiu, S.; Liang, L.; Wang, Q.; Geng, D.; Wu, J.; Wang, S.; Chen, B. Estimation of European terrestrial ecosystem NEP based on an improved CASA model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 16, 1244–1255. [Google Scholar] [CrossRef]
Liang, L.; Wang, Q.; Qiu, S.; Geng, D.; Wang, S. NEP estimation of terrestrial ecosystems in China using an improved CASA model and soil respiration model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 10203–10215. [Google Scholar] [CrossRef]
Tu, J.V. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J. Clin. Epidemiol. 1996, 49, 1225–1231. [Google Scholar] [CrossRef]
Gholami, R.; Fakhari, N. Support vector machine: Principles, parameters, and applications. In Handbook of Neural Computation; Elsevier: Amsterdam, The Netherlands, 2017; pp. 515–535. [Google Scholar]
Kumar, S.; Bhatnagar, V. A review of regression models in machine learning. J. Intell. Syst. Comput. 2022, 3, 40–47. [Google Scholar] [CrossRef]
Liu, J.; Zuo, Y.; Wang, N.; Yuan, F.; Zhu, X.; Zhang, L.; Zhang, J.; Sun, Y.; Guo, Z.; Guo, Y. Comparative analysis of two machine learning algorithms in predicting site-level net ecosystem exchange in major biomes. Remote Sens. 2021, 13, 2242. [Google Scholar] [CrossRef]
Liu, Y.; Yang, T.; Tian, L.; Huang, B.; Yang, J.; Zeng, Z. Ada-xg-CatBoost: A combined forecasting model for gross ecosystem product (GEP) prediction. Sustainability 2024, 16, 7203. [Google Scholar] [CrossRef]
Li, L.; Zeng, Z.; Zhang, G.; Duan, K.; Liu, B.; Cai, X. Exploring the Individualized Effect of Climatic Drivers on MODIS Net Primary Productivity through an Explainable Machine Learning Framework. Remote Sens. 2022, 14, 4401. [Google Scholar] [CrossRef]
Zheng, J.; Zhang, Y.; Wang, X.; Zhu, J.; Zhao, G.; Zheng, Z.; Tao, J.; Zhang, Y.; Li, J. Estimation of net ecosystem productivity on the tibetan plateau grassland from 1982 to 2018 based on random forest model. Remote Sens. 2023, 15, 2375. [Google Scholar] [CrossRef]
Grimmer, J.; Roberts, M.E.; Stewart, B.M. Machine learning for social science: An agnostic approach. Annu. Rev. Political Sci. 2021, 24, 395–419. [Google Scholar] [CrossRef]
Tan, H.; Kou, W.; Xu, W.; Wang, L.; Wang, H.; Lu, N. Improved estimation of aboveground biomass in rubber plantations using deep learning on UAV multispectral imagery. Drones 2025, 9, 32. [Google Scholar] [CrossRef]
Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
Genuer, R.; Poggi, J.-M.; Tuleau-Malot, C. Variable selection using random forests. Pattern Recognit. Lett. 2010, 31, 2225–2236. [Google Scholar] [CrossRef]
Speiser, J.L.; Miller, M.E.; Tooze, J.; Ip, E. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 2019, 134, 93–101. [Google Scholar] [CrossRef]
Fernández-Martínez, M.; Sardans, J.; Chevallier, F.; Ciais, P.; Obersteiner, M.; Vicca, S.; Canadell, J.; Bastos, A.; Friedlingstein, P.; Sitch, S. Global trends in carbon sinks and their relationships with CO2 and temperature. Nat. Clim. Chang. 2019, 9, 73–79. [Google Scholar] [CrossRef]
Li, G.; Wang, D.; Wang, K.; Lin, L. A two-dimensional sample screening method based on data quality and variable correlation. Anal. Chim. Acta 2022, 1203, 339700. [Google Scholar] [CrossRef] [PubMed]
Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
Li, Z. Extracting spatial effects from machine learning model using local interpretation method: An example of SHAP and XGBoost. Comput. Environ. Urban Syst. 2022, 96, 101845. [Google Scholar] [CrossRef]
Bi, C.; Yang, K.; Zhang, S.; Zeng, W.; Liu, J.; Rao, Y.; Ma, Y.; Yang, X. Simulation and analysis of afforestation potential areas under different development scenarios in Yunnan Province, China. Ecol. Indic. 2024, 167, 112695. [Google Scholar] [CrossRef]
Meng, L.; Yang, R.; Sun, M.; Zhang, L.; Li, X. Regional Sustainable Strategy Based on the Coordination of Ecological Security and Economic Development in Yunnan Province, China. Sustainability 2023, 15, 7540. [Google Scholar] [CrossRef]
Luo, J.; Zhan, J.; Lin, Y.; Zhao, C. An equilibrium analysis of the land use structure in the Yunnan Province, China. Front. Earth Sci. 2014, 8, 393–404. [Google Scholar] [CrossRef]
Vermote, E.; Kotchenova, S.; Ray, J. MODIS Surface Reflectance User’s Guide; MODIS Land Surface Reflectance Science Computing Facility; Satellite Atmosphere correction & Land Surface Applications: Greenbelt, MD, USA, 2011; Volume 1, pp. 1–40. [Google Scholar]
Smith, P. Bilinear interpolation of digital images. Ultramicroscopy 1981, 6, 201–204. [Google Scholar] [CrossRef]
Karthiga, R.; Usha, G.; Raju, N.; Narasimhan, K. Transfer Learning Based Breast cancer Classification using One-Hot Encoding Technique. In Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Coimbatore, India, 25–27 March 2021; pp. 115–120. [Google Scholar]
Pei, Z.Y.; Ouyang, H.; Zhou, C.P.; Xu, X.L. Carbon balance in an alpine steppe in the Qinghai-Tibet Plateau. J. Integr. Plant Biol. 2009, 51, 521–526. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Lin, G.; Jiang, D.; Fu, J.; Wang, Y. Spatiotemporal evolution characteristics and the climatic response of carbon sources and sinks in the Chinese grassland ecosystem from 2010 to 2020. Sustainability 2022, 14, 8461. [Google Scholar] [CrossRef]
Guo, H.; Cao, C.; Xu, M.; Yang, X.; Chen, Y.; Wang, K.; Duerler, R.S.; Li, J.; Gao, X. Spatiotemporal distribution pattern and driving factors analysis of GPP in Beijing-Tianjin-Hebei region by Long-term MODIS data. Remote Sens. 2023, 15, 622. [Google Scholar] [CrossRef]
Cui, G.; Wang, S.; Li, X.; Dong, L.; Zhu, J. Optimal agricultural structure allocation based on carbon source/sink accounting. Ecol. Indic. 2024, 166, 112349. [Google Scholar] [CrossRef]
Rouse, J., Jr.; Haas, R.; Schell, J.; Deering, D. Paper a 20. In Third Earth Resources Technology Satellite-1 Symposium, Proceedings of the Symposium, Washington, DC, USA, 10–14 December 1973; Goddard Space Flight Center: Washington, DC, USA, 1973; p. 309. [Google Scholar]
Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
Gitelson, A.A.; Gritz, Y.; Merzlyak, M.N. Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves. J. Plant Physiol. 2003, 160, 271–282. [Google Scholar] [CrossRef]
Broge, N.H.; Leblanc, E. Comparing prediction power and stability of broadband and hyperspectral vegetation indices for estimation of green leaf area index and canopy chlorophyll density. Remote Sens. Environ. 2001, 76, 156–172. [Google Scholar] [CrossRef]
Wang, Q.; Moreno-Martínez, Á.; Muñoz-Marí, J.; Campos-Taberner, M.; Camps-Valls, G. Estimation of vegetation traits with kernel NDVI. ISPRS J. Photogramm. Remote Sens. 2023, 195, 408–417. [Google Scholar] [CrossRef]
Qi, J.; Chehbouni, A.; Huete, A.R.; Kerr, Y.H.; Sorooshian, S. A modified soil adjusted vegetation index. Remote Sens. Environ. 1994, 48, 119–126. [Google Scholar] [CrossRef]
Chen, J.M. Evaluation of vegetation indices and a modified simple ratio for boreal applications. Can. J. Remote Sens. 1996, 22, 229–242. [Google Scholar] [CrossRef]
Tucker, C.J. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef]
Pinty, B.; Verstraete, M. GEMI: A non-linear index to monitor global vegetation from satellites. Vegetatio 1992, 101, 15–20. [Google Scholar] [CrossRef]
Rondeaux, G.; Steven, M.; Baret, F. Optimization of soil-adjusted vegetation indices. Remote Sens. Environ. 1996, 55, 95–107. [Google Scholar] [CrossRef]
Ren, H.; Zhou, G.; Zhang, F. Using negative soil adjustment factor in soil-adjusted vegetation index (SAVI) for aboveground living biomass estimation in arid grasslands. Remote Sens. Environ. 2018, 209, 439–445. [Google Scholar] [CrossRef]
Wang, L.a.; Zhou, X.; Zhu, X.; Dong, Z.; Guo, W. Estimation of biomass in wheat using random forest regression algorithm and remote sensing data. Crop J. 2016, 4, 212–219. [Google Scholar] [CrossRef]
Stow, D.; Niphadkar, M.; Kaiser, J. MODIS-derived visible atmospherically resistant index for monitoring chaparral moisture content. Int. J. Remote Sens. 2005, 26, 3867–3873. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Genuer, R.; Poggi, J.-M.; Tuleau-Malot, C. VSURF: An R package for variable selection using random forests. R J. 2015, 7, 19–33. [Google Scholar] [CrossRef]
Li, Y.; Li, C.; Li, M.; Liu, Z. Influence of variable selection and forest type on forest aboveground biomass estimation using machine learning algorithms. Forests 2019, 10, 1073. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Hancock, J.T.; Khoshgoftaar, T.M. CatBoost for big data: An interdisciplinary review. J. Big Data 2020, 7, 94. [Google Scholar] [CrossRef]
Lundberg, S.; Lee, S.-I. An unexpected unity among methods for interpreting model predictions. arXiv 2016, arXiv:1611.07478. [Google Scholar] [CrossRef]
Shorrocks, A.F. Decomposition procedures for distributional analysis: A unified framework based on the Shapley value. J. Econ. Inequal. 2013, 11, 99–126. [Google Scholar] [CrossRef]
Liang, Y.; Kou, W.; Lai, H.; Wang, J.; Wang, Q.; Xu, W.; Wang, H.; Lu, N. Improved estimation of aboveground biomass in rubber plantations by fusing spectral and textural information from UAV-based RGB imagery. Ecol. Indic. 2022, 142, 109286. [Google Scholar] [CrossRef]
Virdi, J.S.; Peng, W.; Sata, A. Feature selection with LASSO and VSURF to model mechanical properties for investment casting. In Proceedings of the 2019 International Conference on Computational Intelligence in Data Science (ICCIDS), Chennai, India, 21–23 February 2019; pp. 1–6. [Google Scholar]
Xing, W.; Feng, Z.; Wei, J.; Xu, S.; Shao, Q.; Wang, W.; Shi, X. Impacts of climate extremes on variations in evergreen forest ecosystem carbon–water fluxes across Southern China. Glob. Planet. Chang. 2025, 252, 104867. [Google Scholar] [CrossRef]
Wang, J.; Zhou, S. Particle swarm optimization-XGBoost-based modeling of radio-frequency power amplifier under different temperatures. Int. J. Numer. Model. Electron. Netw. Devices Fields 2024, 37, 13. [Google Scholar] [CrossRef]
Schratz, P.; Muenchow, J.; Iturritxa, E.; Richter, J.; Brenning, A. Performance evaluation and hyperparameter tuning of statistical and machine-learning models using spatial data. arXiv 2018, arXiv:1803.11266. [Google Scholar] [CrossRef]
Wang, L.; Li, H.; Zhu, Z.; Xu, M.; Liu, D.; Baluch, S.M.; Zhao, Y. Nonlinear dynamics of ecosystem productivity and its driving mechanisms in arid regions: A case study of Ebinur Lake Basin. J. Environ. Manag. 2025, 386, 125770. [Google Scholar] [CrossRef]
Zhang, P.; Jia, Y.; Shang, Y. Research and application of XGBoost in imbalanced data. Int. J. Distrib. Sens. Netw. 2022, 18, 15501329221106935. [Google Scholar] [CrossRef]
Grêt-Regamey, A.; Weibel, B.; Bagstad, K.J.; Ferrari, M.; Tappeiner, U. On the Effects of Scale for Ecosystem Services Mapping. PLoS ONE 2014, 9, e112601. [Google Scholar] [CrossRef] [PubMed]
Li, H.; Zhang, K.; Li, J.; Gao, X.; Shi, G. ADCMT: An Augmentation-Free Dynamic Contrastive Multi-Task Transformer for UGC-VQA. IEEE Trans. Broadcast. 2025, 80, 102489. [Google Scholar] [CrossRef]
Han, D.; Cai, H.; Zhang, L.; Wen, Y. Multi-sensor high spatial resolution leaf area index estimation by combining surface reflectance with vegetation indices for highly heterogeneous regions: A case study of the Chishui River Basin in southwest China. Ecol. Inform. 2024, 80, 102489. [Google Scholar] [CrossRef]
Zhang, Y.; Liao, J.; Xu, C.; Du, M.; Zhang, X. Optimizing variables selection of random forest to predict radial growth of Larix gmelinii var. principis-rupprechtii in temperate regions. For. Ecol. Manag. 2024, 569, 122159. [Google Scholar] [CrossRef]
Jiang, F.; Kutia, M.; Sarkissian, A.J.; Lin, H.; Long, J.; Sun, H.; Wang, G. Estimating the Growing Stem Volume of Coniferous Plantations Based on Random Forest Using an Optimized Variable Selection Method. Sensors 2020, 20, 7248. [Google Scholar] [CrossRef]
Luo, M.; Wang, Y.; Xie, Y.; Zhou, L.; Qiao, J.; Qiu, S.; Sun, Y. Combination of feature selection and catboost for prediction: The first application to the estimation of aboveground biomass. Forests 2021, 12, 216. [Google Scholar] [CrossRef]
Qiu, J.; Yang, J.; Wang, Y.; Su, H. A comparison of NDVI and EVI in the DisTrad model for thermal sub-pixel mapping in densely vegetated areas: A case study in Southern China. Int. J. Remote Sens. 2018, 39, 2105–2118. [Google Scholar] [CrossRef]
Shi, H.; Li, L.; Eamus, D.; Huete, A.; Cleverly, J.; Tian, X.; Yu, Q.; Wang, S.; Montagnani, L.; Magliulo, V. Assessing the ability of MODIS EVI to estimate terrestrial ecosystem gross primary production of multiple land cover types. Ecol. Indic. 2017, 72, 153–164. [Google Scholar] [CrossRef]
Wang, W.; Han, F.; Kong, Z.; Ling, H.; Hao, X. The maximum threshold of vegetation restoration (EVI-Area) in typical watersheds of arid regions under water constraints. Ecol. Indic. 2024, 158, 111580. [Google Scholar] [CrossRef]
Wang, G.; Peng, W.; Zhang, L.; Zhang, J.; Xiang, J. Vegetation EVI changes and response to natural factors and human activities based on geographically and temporally weighted regression. Glob. Ecol. Conserv. 2023, 45, e02531. [Google Scholar] [CrossRef]
Zhu, X.-J.; Qu, F.-Y.; Fan, R.-X.; Chen, Z.; Wang, Q.-F.; Yu, G.-R. Effects of ecosystem types on the spatial variations in annual gross primary productivity over terrestrial ecosystems of China. Sci. Total Environ. 2022, 833, 155242. [Google Scholar] [CrossRef]
Hamoud, Y.A.; Shaghaleh, H.; Zhang, K.; Okla, M.K.; Alaraidh, I.A.; Sheteiwy, M.S.; AbdElgawad, H. Increasing soil clay content increases soil phosphorus availability and improves the growth, physiology, and phosphorus uptake of rice under alternative wetting and mild drying irrigation. Environ. Technol. Innov. 2024, 35, 103691. [Google Scholar] [CrossRef]
Churchman, G.J.; Singh, M.; Schapel, A.; Sarkar, B.; Bolan, N. Clay minerals as the key to the sequestration of carbon in soils. Clays Clay Miner. 2020, 68, 135–143. [Google Scholar] [CrossRef]
Ury, E.A.; Wright, J.P.; Ardon, M.; Bernhardt, E.S. Saltwater intrusion in context: Soil factors regulate impacts of salinity on soil carbon cycling. Biogeochemistry 2022, 157, 215–226. [Google Scholar] [CrossRef]
Ontman, R.; Groffman, P.M.; Driscoll, C.T.; Cheng, Z. Surprising relationships between soil pH and microbial biomass and activity in a northern hardwood forest. Biogeochemistry 2023, 163, 265–277. [Google Scholar] [CrossRef]
Stark, S.; Männistö, M.K.; Eskelinen, A. Nutrient availability and pH jointly constrain microbial extracellular enzyme activities in nutrient-poor tundra soils. Plant Soil 2014, 383, 373–385. [Google Scholar] [CrossRef]
Zhang, C.; Huang, N.; Wang, L.; Song, W.; Zhang, Y.; Niu, Z. Spatial and Temporal Pattern of Net Ecosystem Productivity in China and Its Response to Climate Change in the Past 40 Years. Int. J. Environ. Res. Public Health 2023, 20, 92. [Google Scholar] [CrossRef]
Michaletz, S.T.; Cheng, D.; Kerkhoff, A.J.; Enquist, B.J. Convergence of terrestrial plant production across global climate gradients. Nature 2014, 512, 39–43. [Google Scholar] [CrossRef]
Knapp, B.D.; Huang, K.C. The effects of temperature on cellular physiology. Annu. Rev. Biophys. 2022, 51, 499–526. [Google Scholar] [CrossRef]
Durand, M.; Murchie, E.H.; Lindfors, A.V.; Urban, O.; Aphalo, P.J.; Robson, T.M. Diffuse solar radiation and canopy photosynthesis in a changing environment. Agric. For. Meteorol. 2021, 311, 108684. [Google Scholar] [CrossRef]
Nijp, J.J.; Limpens, J.; Metselaar, K.; Peichl, M.; Nilsson, M.B.; van der Zee, S.E.; Berendse, F. Rain events decrease boreal peatland net CO₂ uptake through reduced light availability. Glob. Change Biol. 2015, 21, 2309–2320. [Google Scholar] [CrossRef]
Parton, W.; Morgan, J.; Smith, D.; Del Grosso, S.; Prihodko, L.; LeCain, D.; Kelly, R.; Lutz, S. Impact of precipitation dynamics on net ecosystem productivity. Glob. Change Biol. 2012, 18, 915–927. [Google Scholar] [CrossRef]
Gray, S.B.; Brady, S.M. Plant developmental responses to climate change. Dev. Biol. 2016, 419, 64–77. [Google Scholar] [CrossRef]
Bach, E.M.; Baer, S.G.; Meyer, C.K.; Six, J. Soil texture affects soil microbial and structural recovery during grassland restoration. Soil Biol. Biochem. 2010, 42, 2182–2191. [Google Scholar] [CrossRef]
Saimun, M.S.R.; Karim, M.R.; Sultana, F.; Arfin-Khan, M.A. Multiple drivers of tree and soil carbon stock in the tropical forest ecosystems of Bangladesh. Trees For. People 2021, 5, 100108. [Google Scholar] [CrossRef]
He, P.; Ma, X.; Meng, X.; Han, Z.; Liu, H.; Sun, Z. Spatiotemporal evolutionary and mechanism analysis of grassland GPP in China. Ecol. Indic. 2022, 143, 109323. [Google Scholar] [CrossRef]
Liu, Y.; Chen, J.M.; He, L.; Zhang, Z.; Wang, R.; Rogers, C.; Fan, W.; de Oliveira, G.; Xie, X. Non-linearity between gross primary productivity and far-red solar-induced chlorophyll fluorescence emitted from canopies of major biomes. Remote Sens. Environ. 2022, 271, 112896. [Google Scholar] [CrossRef]

Figure 1. The geographical location of Yunnan Province, China. (a) The geographical location of Yunnan Province, China. (b) Spatial distribution patterns of topographic features in Yunnan Province and location map of its neighboring provinces. (c) Land use map of Yunnan Province.

Figure 2. The workflow of NEP estimation and driving force analysis based on the Feature Selection–Machine Learning–Mechanism Interpretation framework.

Figure 3. The 1:1 relationship between predicted and measured NEP values using machine learning regression techniques based on multi-source feature inputs. (a) RFR, (b) XGBR, and (c) CatBoost. Red solid line represents the regression fit line, black dashed line represents the 1:1 reference line. The same is repeated below.

Figure 4. Features selected through importance analysis based on RSV, EV, and their combination. The inner circle displays feature importance scores numerically, while the outer ring annotates corresponding feature names. Connecting lines visually associate identical features across different groupings.

Figure 5. The 1:1 relationship between predicted and measured NEP values using machine learning regression techniques based on RFR importance analysis. (a) RFR, (b) XGBR, and (c) CatBoost.

Figure 6. The 1:1 relationship between predicted and measured NEP values using machine learning regression techniques based on VSURF-selected. (a) RFR, (b) XGBR, and (c) CatBoost.

Figure 7. Importance ranking and SHAP plots of NEP driving factors. (a) The bar plot of feature importance, (b) SHAP summary plot.

Figure 8. SHAP dependence plots for key influencing factors: (a) EVI, (b) VegType, (c) pH, (d) SC, (e) SS, (f) Temp, (g) SolRad, and (h) Precip.

Figure 9. Dominant interaction effects between EVI and environmental factors on NEP estimation. The color distribution of the points in each subplot above indicates the size of the input features on the right side; the magnitude of the values represented by different colors is shown in the color bar on the right side of the subplot. The same is repeated below. (a) Interaction between EVI and SolRad, (b) Interaction between EVI and Temp, (c) Interaction between EVI and Precip, (d) Interaction between EVI and PET, (e) Interaction between EVI and SC, and (f) Interaction between EVI and SS.

Figure 10. Interaction effects of low SHAP impact factors on NEP estimation. (a) Interaction between Precip and Temp, (b) Interaction between SC and SS, (c) Interaction between Precip and SC, (d) Interaction between Precip and SS, (e) Interaction between Temp and SC, (f) Interaction between Temp and SS, and (g) Interaction between VegType and Temp.

Table 1. The detailed parameter settings for the Count Crops tool.

Environmental Variable Name	Abbreviation	Spatial Resolution	Temporal Resolution	Unit
Elevation	Elev	90 m	-	meters
Aspect	Aspect	90 m	-	degrees
Slope	Slope	90 m	-	degrees
Soil pH in H₂O	pH	250 m	-	dimensionless
Soil Sand Content	SS	250 m	-	%
Soil Organic Carbon Content	SOC	250 m	-	%
Soil Clay Content	SC	250 m	-	%
Potential Evapotranspiration	PET	1 km	monthly	0.1 mm
Temperature	Temp	1 km	monthly	0.1 °C
Precipitation	Precip	1 km	monthly	0.1 mm
Solar Radiation	SolRad	1 km	monthly	KJ/m²
Nighttime Light	NTL	1 km	yearly	dimensionless (0–63)
Population Density	PopDens	1 km	yearly	persons/km²
Vegetation Type	VegType	500 m	yearly	dimensionless (1–17)

Table 2. Classification of vegetation types of the International Geosphere-Biosphere Programme (IGBP).

Name	Value
Evergreen Needleleaf Forests	1
Evergreen Broadleaf Forests	2
Deciduous Needleleaf Forests	3
Deciduous Broadleaf Forests	4
Mixed Forests	5
Closed Shrublands	6
Open Shrublands	7
Woody Savannas	8
Savannas	9
Grasslands	10
Permanent Wetlands	11
Croplands	12
Urban and Built-up Lands	13
Cropland/Natural Vegetation Mosaics	14
Permanent Snow and Ice	15
Barren	16
Water Bodies	17

Table 3. Vegetation indices and their spectral band information derived from MODIS imagery for NEP estimation.

VIs	Name	Formula/Source	Reference
DVI	Difference vegetation index	$N I R - R$	[51]
EVI	Enhanced vegetation Index	$2.5 \times \frac{(N I R - R)}{(N I R + 6 \times R - 7.5 \times B + 1)}$	[52]
GCVI	Green chlorophyll vegetation index	$N I R / G - 1$	[53]
GI	Green index	$G / R$	[54]
GNDVI	Green normalized difference vegetation index	$(N I R - G) / (N I R + G)$	[53]
KNDVI	Kernel normalized difference vegetation Index	$\frac{(N I R - R)}{(N I R + R)}$	[55]
MSAVI	Modified soil-adjusted vegetation index	$\frac{(2 \times N I R + 1 - \sqrt{{(2 \times N I R + 1)}^{2} - 8 \times (N I R - R)})}{2}$	[56]
MSR	Modified simple ratio	$(N I R / R - 1) / \sqrt{N I R / R + 1}$	[57]
NDVI	Normalized difference vegetation index	(NIR − R)/(NIR + R)	[58]
NGRDI	Normalized green–red difference index	$(G - R) / (G + R)$	[54]
NLI	Nonlinear index	$(N I R^{2} - R) / (N I R^{2} + R)$	[59]
OSAVI	Optimized soil-adjusted vegetation index	$1.16 (N I R - R) / (N I R + R + 0.16)$	[60]
RDVI	Renormalized difference vegetation index	$(N I R - R) / \sqrt{N I R + R}$	[57]
RVI	Ratio vegetation index	NIR/R	[51]
SAVI	Soil-adjusted vegetation index	$1.5 (N I R - R) / (N I R + R + 0.5)$	[61]
TVI	Triangular vegetation index	$60 (N I R - G) - 100 (R - G)$	[62]
VARI	Visible atmospherically resistant index	$\frac{(G - R)}{(G + R - B)}$	[63]
Red	Red band	MOD09A1	-
Green	Green band	MOD09A1	-
Blue	Blue band	MOD09A1	-
NIR	Near-infrared band	MOD09A1	-

Note: G, R, RE, and NIR represent the reflectance values in the green, red, red-edge, and near-infrared bands, respectively.

Table 4. Performance comparison of three ML regression algorithms for NEP estimation using RSV, EV, and their combination.

Method	Features	R²		RMSE gC/(m²·a)		MAE gC/(m²·a)
Method	Features	Training	Testing	Training	Testing	Training	Testing
RFR	RSV	0.72	0.67	165.84	180.66	126.78	134.29
XGBR		0.77	0.72	151.09	163.92	113.90	123.06
CatBoost		0.73	0.72	163.48	164.95	123.58	124.52
RFR	EV	0.79	0.76	141.67	153.27	105.24	106.33
XGBR		0.82	0.81	132.89	136.79	103.25	105.99
CatBoost		0.80	0.80	139.15	139.56	106.72	106.94
RFR	RSV-EV	0.88	0.87	110.13	111.40	78.26	79.43
XGBR		0.90	0.89	94.00	103.97	70.38	77.28
CatBoost		0.88	0.87	109.77	111.48	81.85	82.97

Table 5. Accuracy assessment of NEP estimation models based on random forest regression feature selection and feature importance analysis from different ML algorithms.

Method	Features	R²		RMSE gC/(m²·a)		MAE gC/(m²·a)
Method	Features	Training	Testing	Training	Testing	Training	Testing
RFR	RSV	0.72	0.66	166.95	180.94	128.03	134.47
XGBR		0.76	0.72	151.37	163.88	114.10	123.02
CatBoost		0.73	0.72	163.41	164.78	123.46	124.34
RFR	EV	0.76	0.73	154.03	162.86	106.56	113.25
XGBR		0.81	0.80	135.75	139.33	104.99	107.51
CatBoost		0.80	0.79	140.67	141.42	107.56	108.01
RFR	RSV-EV	0.90	0.89	101.73	103.22	72.04	73.62
XGBR		0.90	0.88	102.23	106.06	72.10	78.47
CatBoost		0.87	0.87	111.98	113.81	83.02	84.23

Table 6. Feature selection results based on VSURF algorithm analysis.

Features	VSURF Features
RSV	EVI, VARI, DVI, TVI, Blue, MSAVI, GNDVI
EV	VegType, pH, Precip, Temp, SC, PET, SS
RSV-EV	EVI, VegType, pH, SC, Temp, SS, SolRad, Precip, GI, PET, DVI

Table 7. Accuracy evaluation of NEP estimation models combining VSURF feature selection with different ML algorithms.

Method	Features	R²		RMSE gC/(m²·a)		MAE gC/(m²·a)
Method	Features	Training	Testing	Training	Testing	Training	Testing
RFR	RSV	0.74	0.71	163.23	168.00	123.12	125.34
XGBR		0.73	0.70	161.46	170.60	120.02	126.24
CatBoost		0.76	0.73	149.83	162.99	113.00	121.80
RFR	EV	0.80	0.79	138.65	141.50	107.79	97.28
XGBR		0.80	0.79	139.89	143.08	108.06	110.33
CatBoost		0.78	0.78	114.91	145.13	110.72	110.92
RFR	RSV-EV	0.93	0.90	78.78	97.51	61.24	69.61
XGBR		0.98	0.94	22.32	76.82	16.35	55.11
CatBoost		0.94	0.92	75.55	89.42	55.90	65.52

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xia, L.; Tan, H.; Zhang, J.; Yang, K.; Teng, C.; Huang, K.; Yang, J.; Cheng, T. Remote Sensing and Machine Learning Uncover Dominant Drivers of Carbon Sink Dynamics in Subtropical Mountain Ecosystems. Remote Sens. 2025, 17, 2843. https://doi.org/10.3390/rs17162843

AMA Style

Xia L, Tan H, Zhang J, Yang K, Teng C, Huang K, Yang J, Cheng T. Remote Sensing and Machine Learning Uncover Dominant Drivers of Carbon Sink Dynamics in Subtropical Mountain Ecosystems. Remote Sensing. 2025; 17(16):2843. https://doi.org/10.3390/rs17162843

Chicago/Turabian Style

Xia, Leyan, Hongjian Tan, Jialong Zhang, Kun Yang, Chengkai Teng, Kai Huang, Jingwen Yang, and Tao Cheng. 2025. "Remote Sensing and Machine Learning Uncover Dominant Drivers of Carbon Sink Dynamics in Subtropical Mountain Ecosystems" Remote Sensing 17, no. 16: 2843. https://doi.org/10.3390/rs17162843

APA Style

Xia, L., Tan, H., Zhang, J., Yang, K., Teng, C., Huang, K., Yang, J., & Cheng, T. (2025). Remote Sensing and Machine Learning Uncover Dominant Drivers of Carbon Sink Dynamics in Subtropical Mountain Ecosystems. Remote Sensing, 17(16), 2843. https://doi.org/10.3390/rs17162843

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Remote Sensing and Machine Learning Uncover Dominant Drivers of Carbon Sink Dynamics in Subtropical Mountain Ecosystems

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Materials

2.2.1. Satellite Imagery Acquisition

2.2.2. Environmental Variable (EV) Acquisition

2.2.3. NEP Data Acquisition

2.2.4. Remote Sensing Variable (RSV) Acquisition

2.3. Methods

2.3.1. Variable Selection Based on Feature Importance

2.3.2. Random Forest Regression

2.3.3. Extreme Gradient Boosting Regression

2.3.4. Categorical Boosting Regression

2.4. SHapley Additive exPlanations (SHAP)

2.5. Accuracy Analysis

3. Results

3.1. Machine Learning-Based NEP Estimation

3.2. NEP Estimation Model Based on Feature Selection

3.2.1. Accuracy Assessment of Variable-Selected Models Based on Importance Analysis

3.2.2. Accuracy Evaluation of VSURF-Based Feature-Selected Models

3.3. SHAP Interpretation of XGBR Model

3.3.1. SHAP-Based Single-Factor Impact Analysis

3.3.2. SHAP-Based Analysis of Feature Interactions

4. Discussion

4.1. Performance of ML-Based Algorithms

4.2. Variable Selection for NEP

4.3. The Driving Mechanisms of NEP and Their Ecological Implications as Revealed by the SHAP Framework

4.4. Limitations and Future Perspectives

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI