Next Article in Journal
Human Perception and Intervention Reshape Stability Landscapes in Mixedwood Forests
Previous Article in Journal
Multi-Hydrological Factor-Driven Attribution and Future Prediction of Vegetation Dynamics on the Qinghai-Tibetan Plateau
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring the Critical Thresholds of Environmental Factors on Net Primary Productivity in the Yellow River Basin

College of Geography and Environment, Shandong Normal University, Jinan 250358, China
*
Author to whom correspondence should be addressed.
Forests 2026, 17(6), 674; https://doi.org/10.3390/f17060674
Submission received: 27 April 2026 / Revised: 25 May 2026 / Accepted: 28 May 2026 / Published: 1 June 2026
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Abstract

Net primary productivity (NPP) is an important indicator for assessing ecosystem productivity and carbon cycling. The Yellow River Basin (YRB), as an important ecological conservation zone and economic region in China, is highly sensitive to climate change, land use change, and ecological restoration. Understanding the spatiotemporal variation in NPP and its relationships with environmental factors is therefore important for regional ecological management. In this study, MODIS NPP data, ERA5-Land environmental variables, land use data, machine learning algorithms, and SHAP-based model interpretation were used to analyze the spatiotemporal patterns of NPP and the nonlinear responses of NPP to environmental factors in the YRB from 2001 to 2020. The results showed the following: (1) NPP exhibited a spatial pattern of higher values in the south and lower values in the north. The annual average NPP showed a fluctuating upward trend, and most pixels showed varying degrees of increase during the study period. (2) Moisture-related variables contributed more strongly to model-predicted NPP variations in the entire basin than thermal variables. (3) For different ecosystem types, surface solar radiation downwards (SSRD) made the largest contribution to model-predicted NPP variations in cropland and forest ecosystems and showed a negative relationship with NPP, whereas evapotranspiration (E) contributed most strongly to model-predicted NPP in grassland ecosystems and showed a positive relationship with NPP. (4) Most environmental factors showed nonlinear associations with model-predicted NPP, and SHAP-derived response thresholds differed among ecosystem types. These thresholds should be interpreted as model-based nonlinear response points rather than confirmed ecological tipping points or ecological regime shifts. This study provides a reference for understanding the heterogeneous responses of vegetation productivity to environmental factors in the YRB.

1. Introduction

Net primary productivity (NPP) refers to the difference between organic matter produced by plants through photosynthesis and that consumed through respiration of the ecosystem, serving as a critical indicator for assessing carbon cycling [1,2,3]. Currently, various methods exist for estimating NPP [4,5,6,7], each with its own advantages and limitations [8]. Field sampling methods provide relatively accurate NPP estimates, but their results are heavily influenced by the quantity and spatial distribution of sampling points, and large-scale spatial estimation and dynamic monitoring remain challenging [9,10]. Climate productivity models are simple in principle and suitable for analyzing long-term trends but fail to account for local ecological factors. Physio-ecological process models can simulate complex ecological processes but may introduce significant uncertainty due to assumptions and parameter settings [11]. Light use efficiency models, such as the CASA model, are simpler, have fewer parameters and effectively capture spatiotemporal variations of NPP across different spatial scales [12,13,14]. While the CASA model has been widely applied to global ecosystems [15,16,17,18,19], it was originally developed for North American climatic, soil, and vegetation conditions, so its accuracy for other regions requires further validation [20]. By using a remote sensing monitoring method, large-scale, long-term NPP data can be acquired, though its precision depends on the resolution of satellite imagery. Among these, the MODIS NPP dataset is widely accepted for its reliable accuracy and has been extensively used to study NPP dynamics [21,22,23].
Climate change and human activities are widely recognized as primary drivers of NPP variations [24]. Regarding climate change, NPP fluctuations are closely linked to factors such as temperature, relative humidity, and solar radiation [25,26,27]. Previous studies on the influencing factors of NPP variations of ecosystems have primarily focused on employing correlation analyses and partial correlation analyses to examine positive and negative linear correlations between NPP and climate variables [28]. However, the relationships between environmental factors and NPP are rarely straightforward positive correlation or negative correlation [29]. Instead, these relationships often exhibit complex, nonlinear dynamics. Hence, it is crucial to elucidate the nonlinear responses of NPP to different factors. In recent years, the integration of remote sensing and machine learning algorithms has enabled the exploration of nonlinear responses of NPP to environmental factors [30,31], as well as the prediction of future ecosystem NPP dynamics based on these relationships [32,33,34,35].
The Yellow River Basin (YRB) represents a vital ecological conservation zone and economic region in China [36,37]. However, its complex topography, significant climatic variability, intensified global climate change, and escalating human activities have collectively imposed substantial pressure on its ecological environment [38]. To achieve the goals of ecological protection and high-quality development in the basin, a comprehensive understanding of its ecological status is imperative. In this context, investigating NPP trends and their drivers across different ecosystems in the YRB holds particular significance [39]. Existing studies on NPP influencing factors in the YRB predominantly focus on simple linear relationships between individual factors and NPP [40,41,42,43], largely overlooking nonlinear effects [44]. Moreover, this nonlinear effect often exhibits a threshold, beyond which the impact on NPP becomes reversed. However, research on the threshold effects of NPP responses to various factors in the YRB remains limited. To address these gaps, this study aims to: (1) use MODIS NPP data to analyze the spatiotemporal variations of NPP across the YRB and its ecosystem types from 2001 to 2020; (2) use machine learning algorithms to develop regression models linking NPP to climate and soil factors for the entire basin and specific ecosystem types; (3) employ the SHAP method to identify thresholds for the responses of NPP to various factors in different ecosystems.
In this study, the term “critical threshold” refers to a SHAP-derived response threshold within the machine learning framework, rather than an independently verified ecological tipping point or regime shift. These thresholds are therefore interpreted as model-based nonlinear response points and are discussed cautiously in the following sections.

2. Materials and Methods

2.1. Study Area

The Yellow River Basin refers to the catchment area of the Yellow River and its tributaries, located in the central region of China (Figure 1), which is generally high in the west and low in the east. The YRB exhibits pronounced seasonal climatic variability. During summer, the warm and humid southeast monsoon drives elevated temperatures with frequent extreme heatwaves, while triggering abundant precipitation. In contrast, winter is governed by the influence of Siberian cold air masses, which not only induce significantly lower temperatures but also suppress precipitation formation due to cold, dry atmospheric conditions.

2.2. Data Source

(1)
NPP data
Annual NPP data from 2001 to 2020 are from the MODIS annual NPP product (https://lpdaac.usgs.gov (accessed on 25 February 2026)), which provides annual net primary productivity at a spatial resolution of 500 m. The data were clipped to the boundary of the Yellow River Basin and aggregated to a spatial resolution of 0.1° to match the environmental variables used in the subsequent correlation and machine learning analyses.
(2)
Land use type data
Land use data from 2001 to 2020 comes from the 30 m annual land cover dataset of China released by Jie Yang and Xin Huang of Wuhan University (https://zenodo.org/ (accessed on 25 February 2026)). The dataset includes nine land use types: cropland, forest, shrub, grassland, water area, ice and snow, bare land, impermeable surface and wetland.
To ensure consistency among datasets, the land use data were resampled to a spatial resolution of 0.1° using the nearest-neighbor method and then aligned with the environmental variables used for NPP analysis. The nearest-neighbor method was used because land use data are categorical variables. The coordinate system was converted to the standardized WGS 1984 projection.
(3)
Influencing factors data
High-quality climate drivers (temperature, precipitation, total solar radiation, evapotranspiration, and air pressure) and soil physical property factors (soil moisture content and soil temperature) from 2001 to 2020 are derived from the ERA5-Land monthly reanalysis dataset provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). Reanalysis data were analyzed with a spatial resolution of 0.1°. The abbreviations of the influencing factors are shown in Table 1.

2.3. Methods

2.3.1. Trend Analysis and Significance Test

This paper combines the Sen slope estimation and the Mann–Kendall (MK) significance test to analyze the changing trend and significance of vegetation NPP at the pixel scale in the YRB from 2001 to 2020 [45,46,47,48,49]. The formulas are as follows:
S l o p e = m e d i a n N P P j N P P i j i
s g n x j x i = 1 , x j x i > 0 0 , x j x i = 0 1 , x j x i < 0
S = i = 1 n 1 j = i + 1 n s g n x j x i
Z M K = S 1 v a r ( S ) , S > 0 0 , S = 0 S + 1 v a r ( S ) , S < 0
V a r ( S ) = n ( n 1 ) ( 2 n + 5 ) 18
where x i and x j are the values of NPP of the pixel in the i -th and j -th years and n is the time scale. Z M K is the statistic, and at a given significance level α, when | Z M K |> μ1α/2, it means that there is a significant change in the study sequence at the α level. This paper divides the change trends into five categories by setting different significance levels (Table 2).

2.3.2. Fitting NPP Based on Machine Learning Algorithms

To analyze the impact of various influencing factors on NPP in the YRB, 8171 samples of MODIS NPP data were uniformly selected at a resolution of 0.1° every year to establish regression equations between factors and NPP. Different vegetation growth and production processes respond differently to climate and soil factors [50]. Grassland, forest and cropland are the most widely distributed land use types in the YRB. Therefore, regression equations between NPP and various factors of grassland system, forest system and cropland system are established. Because different machine learning algorithms perform differently when processing different tasks, the following machine learning algorithms are used for modeling, and the optimal algorithm is selected by comparing the R2 and RMSE of each algorithm.
(1)
AdaBoost (adaptive boosting) is an effective ensemble learning algorithm that improves the overall prediction performance by combining multiple weak regression models [51]. Its basic principle is to gradually train each model so that the new model focuses on samples that are predicted incorrectly by the previous model and to enhance the learning ability of the model by adjusting the sample weights. The prediction results of each weak model are weighed according to their accuracy, and finally a strong regression model is formed.
In this study, the AdaBoost (adaptive boosting) model was configured with 100 estimators (n_estimators = 100) and a learning rate of 0.1 (learning_rate = 0.1). A linear loss function (loss = ‘linear’) was adopted for the regression task. The algorithm sequentially trains weak learners, with each subsequent learner assigning higher weights to samples that were poorly predicted by the previous learner, thereby progressively improving overall prediction accuracy.
(2)
Random forest is an ensemble learning method that improves the accuracy of classification and regression by constructing multiple decision trees and combining their results [52]. Its basic principles include the following four steps: ① sample sampling, using the bootstrap method to randomly extract multiple subsample sets from the training set. Each tree is trained using a different sample set. ② Feature selection, where during the splitting process of each node, a part of the features is randomly selected instead of using all the features. This randomness increases the diversity of the model and reduces the risk of overfitting. ③ Decision tree construction, where a decision tree is constructed for each subsample set, and pruning is usually not performed to retain more information. ④ Ensemble prediction, where for classification tasks, random forest determines the final classification result by voting; for regression tasks, the average of the prediction values of each tree is taken as the final output [53]. As a flexible and efficient tool, random forest plays an important role in the field of machine learning and data mining.
The random forest model was constructed using 200 decision trees (n_estimators = 200). The maximum depth of each tree was set to 15 (max_depth = 15) to control model complexity. The minimum number of samples required to split an internal node was set to 5 (min_samples_split = 5), and the minimum number of samples required at a leaf node was set to 2 (min_samples_leaf = 2). The number of features considered for each split was set to the square root of the total number of features (max_features = ‘sqrt’). Bootstrap sampling was used to randomly draw subsamples from the training set for each tree, and the final prediction was obtained by averaging the outputs of all trees.
(3)
XGBoost (extreme gradient boosting) is an efficient gradient boosting algorithm that optimizes the model by gradually building a decision tree [54]. Its core principle is to use the gradient and second-order derivative information of the loss function to achieve fast model training through Taylor expansion. XGBoost also introduces regularization technology to control the complexity of the model, thereby effectively preventing overfitting. In addition, XGBoost supports parallel computing, which significantly improves the training speed. Studies have shown that this method performs well in a variety of tasks, especially when processing large-scale data [55]. These features make XGBoost an important tool in the field of machine learning.
The XGBoost model was configured with 200 boosting rounds (n_estimators = 200) and a learning rate of 0.05 (learning_rate = 0.05). The maximum depth of each tree was limited to 8 (max_depth = 8) to prevent overfitting. Subsample ratios of 0.8 were applied to both the training samples (subsample = 0.8) and the features (colsample_bytree = 0.8) at each boosting iteration, introducing randomness to enhance model generalization. The objective function was set to squared error regression (objective = ‘reg:squarederror’). The algorithm iteratively constructs decision trees using the gradient and second-order derivative information of the loss function, with L1 and L2 regularization terms incorporated to control model complexity.
To ensure reproducibility, we explicitly defined the machine learning modeling workflow. First, all samples were randomly split into training and testing subsets at a ratio of 80:20 before model fitting. Model performance was evaluated using the coefficient of determination (R2) and root mean square error (RMSE). The main hyperparameters were predefined before model fitting and kept consistent during model comparison. The final hyperparameter settings used in this study are listed in Table 3 to improve reproducibility. Considering model performance, computational efficiency, and compatibility with SHAP-based interpretation, XGBoost was selected for subsequent explanatory analysis.
Because environmental variables may be correlated in large-scale ecological datasets, the interpretation of model outputs was treated with caution. Tree-based machine learning algorithms can capture nonlinear relationships and are less constrained by multicollinearity than ordinary linear regression models. However, correlated predictors may still affect the interpretation of individual feature contributions. Therefore, SHAP values were interpreted as model-based contributions rather than direct ecological causality.

2.3.3. Response of NPP to Various Factors

The SHAP (SHapley Additive exPlanation) method was used to interpret the prediction results of the trained machine learning model. Its core idea is to calculate the marginal contribution of each feature to the model output, thereby explaining the “black-box” model from both global and local perspectives [56]. For each sample, the model generates a predicted NPP value. A negative SHAP value indicates that the feature contributes negatively to the model-predicted NPP, whereas a positive SHAP value indicates a positive contribution to the model prediction. The absolute magnitude of the SHAP value represents the relative contribution of the feature to the model output [57]. Therefore, SHAP values were used in this study to explain the trained machine learning model and should not be interpreted as direct evidence of ecological causality. By calculating the SHAP value, the response of the dependent variable to single factors and factor interactions can be analyzed [58]. The calculation formula for the Shapley value of feature i is as follows:
φ i ( f ) = S N { i } S ! N S 1 ! N ! f S { i } f ( S )
where f represents the contribution of feature i , N represents a set of n features, and f S { i } and f ( S ) represent the model results with or without feature i .

2.3.4. Definition and Identification of Critical Thresholds

In this study, the term “critical threshold” refers to a SHAP-derived response threshold within the machine learning framework rather than a confirmed ecological tipping point or ecological regime shift. Specifically, it represents an approximate value or range of an environmental factor at which its contribution to model-predicted NPP changes markedly in direction or magnitude. Therefore, the thresholds identified in this study should be interpreted as model-based nonlinear response points rather than fixed ecological constants.
The thresholds were identified from SHAP dependence plots. For each environmental factor, the relationship between the factor value and its corresponding SHAP value was examined. A response threshold was identified when the SHAP contribution changed from negative to positive or from positive to negative. In cases where no clear zero-crossing point was observed, the threshold was identified as the approximate value at which the SHAP response curve showed an evident change in trend or a clear weakening of marginal contribution. These thresholds indicate changes in the contribution of environmental factors to model-predicted NPP rather than direct evidence of causal ecological transitions.
To avoid overinterpreting local fluctuations in the SHAP scatter plots, threshold identification was based on the overall response pattern of the SHAP dependence curve rather than isolated points. The thresholds reported in this study are therefore empirical and data-dependent. Their ecological meaning should be interpreted cautiously, and this requires further validation using field observations, long-term monitoring data, or process-based ecosystem models.

3. Results

3.1. Spatial and Temporal Changes in NPP in the YRB

There were differences in the NPP of different ecosystem types in the YRB from 2001 to 2020. The annual average NPP of forests was about 307.8–549.6 gC·m−2·a−1, which was significantly higher than the annual average NPP of cropland and grassland (Figure 2a). The annual average NPP of the entire basin showed a fluctuating upward trend, increasing from 206.1 gC·m−2·a−1 in 2001 to 327.9 gC·m−2·a−1 in 2020. The policy of wasteland reclaim and the policy of high-quality cropland returning to forest and grassland have played a certain role in promoting the improvement of NPP in the YRB (Figure 2b–e). Some cropland with low productivity has been transformed into grassland, which not only improves the average productivity of cropland ecosystem but also increases the total productivity of grassland ecosystem.
To further quantify land use changes during the study period, the changes in land use area and percentage in the YRB from 2001 to 2020 are summarized in Table 4.
As shown in Table 4, grassland was the dominant land use type in the YRB, accounting for 57.49% and 57.95% of the total area in 2001 and 2020, respectively. From 2001 to 2020, cropland decreased by 16,998.69 km2, while forest, grassland, water, impervious surface, and wetland increased by 13,264.45 km2, 3627.82 km2, 1557.02 km2, 9438.09 km2, and 275.92 km2, respectively. In contrast, shrub, snow/ice, and bare land decreased during the same period. These results indicate that obvious land use conversion occurred in the YRB, especially the decrease in cropland and bare land and the increase in forest, grassland, and impervious surface. Such land use changes provide important background information for understanding the spatial and temporal variation in NPP in the basin. The increase in forest and grassland areas may be related to ecological restoration measures, while the expansion of impervious surface reflects the influence of urbanization. These changes may partly explain the spatial heterogeneity of NPP trends shown in Figure 3.
From 2001 to 2020, the annual average NPP distribution of the YRB is generally characterized by high in the south and low in the north (Figure 3a), with the highest value of 985.26 gC·m−2·a−1 and the lowest value of 16.75 gC·m−2·a−1. The high-value area appears in the forest area of the south of the Loess Plateau and the north of Guanzhong Plain, and the low-value area appears in the north of the Loess Plateau and the east of the Qinghai–Tibet Plateau. Figure 3b shows the changing trends of NPP in the YRB during the study period, in which the NPP of most pixels showed significant and slight increasing trends. Only 2.1% of pixels show a degradation trend, mostly distributed in urban areas; 4.6% of the pixels showed a stable trend, which may be related to the lack of remote sensing data.

3.2. The Spatial Distribution of Factors and Correlations

NPP is affected by multiple environmental factors, and the spatial heterogeneity of climate and soil conditions partly determines the spatial distribution of NPP in the YRB. Figure 4 shows the spatial distributions of the selected environmental variables. TEM, STL, SP, and SSRD exhibited pronounced east–west differences and were closely associated with topographic gradients. In contrast, TP and SWVL showed more evident north–south differences, reflecting spatial heterogeneity in moisture availability. E generally decreased from the southeastern part of the basin toward the northwestern region. These spatial patterns suggest that NPP responses to environmental factors are likely to vary along elevation, moisture, and climatic gradients rather than following a uniform basin-wide pattern.
The spatial distributions of the pixel-wise correlations between NPP and the environmental factors further indicated that the relationships between vegetation productivity and environmental conditions were spatially heterogeneous (Figure 5). In high-altitude regions of the western basin, increases in temperature and soil temperature were generally associated with higher NPP, whereas in the northern semi-arid regions, higher temperature may intensify water stress and thus show negative correlations with NPP. The relationship between precipitation and NPP also varied spatially, suggesting that the effect of water supply may depend on local moisture background and vegetation type. SSRD showed positive correlations in parts of the alpine meadow region but negative correlations in many other regions, possibly because stronger radiation can increase evapotranspiration demand under water-limited conditions.
To further support the spatial correlation analysis shown in Figure 5, Pearson correlation coefficients between NPP and the selected environmental factors were calculated for the entire YRB and for different ecosystem types, including cropland, forest, and grassland (Table 5).
As shown in Table 5, E showed the strongest positive correlation with NPP in the entire YRB and grassland ecosystems, with correlation coefficients of 0.41 and 0.48, respectively. SSRD showed negative correlations with NPP across all sample groups, especially in cropland and forest ecosystems. These results provide quantitative support for the spatial correlation patterns shown in Figure 5 and indicate that the relationships between NPP and environmental factors varied among different ecosystem types.

3.3. NPP Fitting Based on Machine Learning

In the task of fitting the NPP of the YRB, all three algorithms showed acceptable performance (Table 6). Considering model performance, computational efficiency, and compatibility with SHAP analysis, the XGBoost algorithm was selected for subsequent interpretation of environmental factor contributions. It is worth noting that the three algorithms performed less well for forest NPP than for the total, grassland, and cropland samples, suggesting that additional factors not included in the current predictor set may affect forest productivity.
For the importance of influencing factors, the conclusions obtained by the three algorithms were relatively consistent. Overall, moisture-related variables contributed more strongly to the model prediction of NPP in the YRB than thermal variables (Figure 6a–c). In grassland ecosystems, E was the most important factor in the regression models (Figure 6d–f), whereas in forest and cropland ecosystems, SSRD contributed most strongly to model prediction differences (Figure 6g–l). Appropriate solar radiation may support plant growth, whereas excessive or insufficient radiation may reduce productivity. Because machine learning algorithms are black-box models, feature importance alone cannot reveal the direction or shape of each factor’s effect on NPP. Therefore, the SHAP interpretability method was subsequently applied to analyze the fitted regression model.

3.4. Explanation of the Model Based on the SHAP Method

SSRD was the most important predictor of cropland NPP in the fitted model (Figure 7a). As SSRD increased, its positive SHAP contribution gradually weakened, and the contribution became negative when SSRD exceeded approximately 5924 MJ·m−2 (Figure 7b). This pattern suggests a nonlinear modeled association between solar radiation and cropland NPP rather than direct evidence of a causal ecological threshold. For TEM, low temperature showed negative SHAP contributions to the predicted NPP, but this negative contribution weakened when TEM rose above 5 °C (Figure 7c). When TEM exceeded 11.1 °C, its contribution shifted from negative to positive. E and SWVL exhibited similar nonlinear patterns (Figure 7d,g): as E and SWVL increased, their negative contributions gradually weakened and eventually became positive. For STL, lower STL values (<12.4 °C) were associated with higher model contributions to crop NPP (Figure 7f). In addition, TP showed a weak negative SHAP contribution to predicted cropland NPP (Figure 7h), but this result should be interpreted cautiously because SHAP values represent model contributions rather than independent causal effects.
SSRD was also the most important predictor for forest NPP in the fitted model (Figure 8a). Strong SSRD showed negative SHAP contributions to predicted forest NPP (Figure 8b). With increasing TEM, its SHAP contribution shifted from negative to positive at approximately 9.67 °C (Figure 8c). For SP, values above a certain threshold showed negative contributions to predicted NPP (Figure 8d). When the STL was below 11 °C, soil warming showed positive contributions to predicted NPP; however, when the STL exceeded 11 °C, further increases in soil temperature were associated with negative contributions (Figure 8e). SWVL and E generally showed positive contributions to predicted forest NPP (Figure 8f,g). The weak negative contribution of TP (Figure 8h) may reflect the adaptation of forest ecosystems in the YRB to local semi-arid or arid climatic conditions, but this interpretation should be treated as a model-based association rather than direct evidence of causality.
For the grassland system, which is the most widely distributed ecosystem type in the YRB, E was the most important predictor in the fitted model (Figure 9a). Higher E values were associated with positive SHAP contributions to predicted NPP, and this positive contribution became more evident when E exceeded approximately 477.42 mm (Figure 9b). Similar to forest and cropland ecosystems, stronger SSRD showed negative SHAP contributions to grassland NPP (Figure 9c). Grasslands are mostly distributed in regions with relatively low temperatures; therefore, low TEM values contributed little to predicted NPP, whereas a TEM above 10 °C showed stronger positive contributions (Figure 9e). SWVL, TP, and STL had relatively small SHAP values, suggesting that their direct model contributions were weaker than those of E, SSRD, and TEM in grassland ecosystems (Figure 9f–h).

4. Discussion

4.1. Distribution and Trend of NPP in the YRB from 2001 to 2020

The high-value areas of average NPP in the YRB from 2001 to 2020 were concentrated in forest areas, while the low-value areas were concentrated in the alpine meadow areas in the east of the Qinghai–Tibet Plateau and the bare land in the north of the Loess Plateau. The annual average NPP in the YRB showed a fluctuating upward trend, with an increasing trend in most areas and a decreasing trend in a very small part of the urban expansion area in the southern part of the YRB. In addition to the contribution of climate change to NPP, the impact of human activities on NPP is also obvious. The natural forest protection project implemented around 2000 led to a rapid increase in forest NPP in the YRB from 2000 to 2003 [59] (Figure 2a). A series of subsequent projects of returning cropland to forest and grassland and wasteland development projects have significantly increased the grassland area in the YRB (Figure 2c,e) [60]. The conversion of low-quality cropland to grassland has promoted the increase in NPP in the YRB.

4.2. Feasibility of Machine Learning Models for NPP Simulation

Machine learning algorithms are suitable for modeling the relationships between NPP and environmental factors because they can capture nonlinear associations that may not be fully represented by traditional linear methods. In this study, AdaBoost, random forest, and XGBoost were used to fit NPP for the entire YRB and for different ecosystem types. As shown in Table 6, all three algorithms showed acceptable performance in fitting NPP. For the entire basin, the R2 values of the three models ranged from 0.835 to 0.844, and the RMSE values ranged from 43.1 to 44.4 gC·m−2·a−1. Similar performance was observed for grassland and cropland samples, indicating that the selected climatic and soil variables can explain a substantial proportion of NPP variation in these ecosystems.
However, the model performance differed among ecosystem types. The fitting performance for forest NPP was lower than that for the entire basin, grassland, and cropland samples, with R2 values ranging from 0.598 to 0.615. This result suggests that forest NPP may be affected by additional factors that were not fully represented by the current predictor set, such as forest structure, species composition, stand age, management practices, disturbance history, and local topographic conditions. Therefore, the relatively lower performance for forest ecosystems indicates the need to incorporate more ecosystem-specific variables in future studies.
Among the three algorithms, the differences in model performance were relatively small. Although AdaBoost showed slightly higher R2 values in several sample groups, XGBoost provided comparable fitting performance and has strong compatibility with SHAP-based model interpretation. Therefore, XGBoost was selected for subsequent analysis of environmental factor contributions. It should be noted that the machine learning results indicate statistical associations within the fitted model, and the feature importance and SHAP results should be interpreted as model-based explanations rather than direct evidence of ecological causality.

4.3. Analysis of Influencing Factors of NPP Based on Machine Learning Algorithms

The three machine learning algorithms showed relatively consistent results regarding the relative importance of environmental factors (Figure 6). For the entire YRB, moisture-related variables contributed more strongly to model-predicted NPP than thermal variables, indicating that water availability played an important role in explaining the spatial and temporal variation in NPP in the basin. This result is also supported by the correlation coefficient analysis in Table 5, where E and SWVL showed positive correlations with NPP in the entire YRB, with correlation coefficients of 0.41 and 0.35, respectively. However, these results should be interpreted as model-based associations rather than direct evidence of ecological causality.
The relationships between NPP and environmental factors differed among ecosystem types. As shown in Table 5, E showed the strongest positive correlation with NPP in grassland ecosystems, with a correlation coefficient of 0.48, while SSRD showed negative correlations with NPP in cropland, forest, and grassland ecosystems. TP showed different relationships among ecosystem types, with a positive correlation in grassland ecosystems but negative correlations in cropland and forest ecosystems. These differences indicate that the response of NPP to environmental factors is ecosystem-dependent and may be related to differences in water demand, canopy structure, rooting depth, and land management conditions.
The feature importance results further showed that the dominant predictors varied among ecosystem types (Figure 6). In cropland ecosystems, SSRD contributed most strongly to the fitted model (Figure 6j–l). The SHAP results showed that higher SSRD values were associated with negative SHAP contributions to predicted cropland NPP (Figure 7b), suggesting that excessive solar radiation may be linked to lower predicted cropland NPP under some conditions. In forest ecosystems, SSRD also made a strong contribution to model-predicted NPP (Figure 6g–i), and higher SSRD values showed negative SHAP contributions (Figure 8b). This pattern may be associated with increased evapotranspiration demand, enhanced water stress, or photoinhibition under strong radiation conditions. However, these explanations should be regarded as possible ecological interpretations of model results rather than direct causal evidence.
For grassland ecosystems, E was the most important predictor in the fitted model (Figure 6d–f). The SHAP dependence plot further showed that higher E values were generally associated with positive SHAP contributions to predicted grassland NPP, especially when E exceeded approximately 477.42 mm (Figure 9b). This result is consistent with the correlation coefficient in Table 5, where E showed the strongest positive correlation with grassland NPP. Since E reflects the combined processes of soil evaporation and plant transpiration, higher E values may indicate greater water availability and more active vegetation physiological processes. Nevertheless, evapotranspiration is also influenced by vegetation growth, temperature, soil moisture, and atmospheric demand. Therefore, the relationship between E and grassland NPP should be interpreted as a coupled water–vegetation response rather than a simple one-way causal relationship.
Overall, the correlation analysis (Table 5), feature importance results (Figure 6), and SHAP dependence curves (Figure 7, Figure 8 and Figure 9) jointly indicate that the responses of NPP to environmental factors are nonlinear and ecosystem-dependent. These findings support the use of machine learning and SHAP analysis for identifying model-based response patterns, but the identified relationships and thresholds should be interpreted cautiously within the modeling framework.

4.4. Regional Heterogeneity, Scale Dependence, and Ecological Interpretation

The YRB is characterized by strong ecological heterogeneity, and the environmental controls on NPP should not be interpreted as uniform across the whole basin. In the upper reaches, high elevation, low temperature, and alpine grassland ecosystems may make vegetation productivity more sensitive to thermal conditions and radiation constraints. In the middle reaches, especially in the Loess Plateau, semi-arid climatic conditions and soil water limitations suggest that precipitation, evapotranspiration, and soil moisture may play more important roles in regulating vegetation growth. In the lower reaches, where cropland and impervious surfaces are more concentrated, NPP dynamics may be more strongly affected by land use change, agricultural management, and human activities. Therefore, the response thresholds identified in this study should be understood as basin-scale or ecosystem-scale modeled patterns, and their applicability may vary among subregions and spatial scales.
The effects of environmental factors on NPP may also interact with each other. For example, precipitation can alleviate water stress, but its ecological effect may depend on temperature, soil moisture, and evapotranspiration demand. Similarly, high solar radiation may enhance photosynthesis under sufficient water supply, but it may also intensify evapotranspiration and water stress under dry conditions. These interactions help explain why simple linear correlations and SHAP-based nonlinear responses may not always show identical patterns. Therefore, the correlation results, feature importance, and SHAP dependence curves should be interpreted as complementary evidence rather than identical measures of ecological influence.

4.5. Uncertainty and Limitations

Although the combination of machine learning and SHAP analysis provides useful information for understanding the nonlinear responses of NPP to environmental factors, several uncertainties and limitations should be acknowledged. First, the NPP data used in this study were derived from MODIS products rather than field observations. MODIS-derived NPP is a model-based remote sensing product, and its accuracy may be affected by uncertainties in input climate variables, land cover classification, mixed pixels, and assumptions related to vegetation-type-specific light use efficiency. In addition, field-measured NPP data in the Yellow River Basin are still limited, which makes it difficult to fully validate the possible systematic bias of the MODIS NPP product in this region. Therefore, the results of this study should be interpreted as model-based estimates derived from remote sensing data.
Second, the regression models in this study mainly included climatic and soil physical factors, including temperature, precipitation, evapotranspiration, soil temperature, surface pressure, solar radiation, and soil water content. However, other potentially important drivers, such as atmospheric CO2 concentration, nitrogen deposition, irrigation, grazing intensity, land management, ecological restoration projects, urban expansion, and disturbance events, were not quantitatively incorporated due to data limitations. The omission of human activity variables is particularly important for the Yellow River Basin, where ecological restoration and land use change have strongly affected vegetation growth during the past two decades. These missing variables may influence the apparent relationships between environmental factors and NPP and may also affect the interpretation of the identified response thresholds.
Third, spatial autocorrelation may also introduce uncertainty into the model validation process. In this study, the samples were randomly divided into training and testing datasets. However, environmental variables in large river basins usually have clear spatial structures, and neighboring samples are not completely independent. Therefore, random splitting may cause information leakage between the training and testing datasets and may lead to an overestimation of model performance. Future studies should consider spatially explicit validation strategies, such as spatial block cross-validation or independent regional validation, to provide a more conservative evaluation of model generalization ability.
Finally, the thresholds identified in this study should be interpreted cautiously. These thresholds were derived from SHAP dependence plots within the machine learning framework, and they indicate changes in model contribution rather than confirmed ecological tipping points or regime shifts. Moreover, the thresholds were identified based on historical data from 2001 to 2020. Extrapolating these empirical thresholds to future climate conditions assumes that the response relationships between NPP and environmental factors remain stable, which may not always hold under novel climate conditions or changing ecosystem adaptation processes. Future research should combine field observations, higher-resolution remote sensing data, human activity indicators, spatial validation methods, and process-based ecosystem models to further verify the ecological meaning and applicability of these SHAP-derived response thresholds.

5. Conclusions

This study used MODIS NPP data and ERA5-Land monthly data to analyze the spatiotemporal pattern of NPP in the YRB from 2001 to 2020, and the influencing factors of NPP in cropland, forest, and grassland were analyzed by combining machine learning with the SHAP method. The following conclusions were drawn:
(1)
The annual average NPP, cropland annual average NPP, forest annual average NPP, and grassland annual average NPP in the YRB all fluctuated and increased. The policy of returning cropland to grassland and the development and utilization of wasteland have made certain contributions to the increase in NPP in the YRB.
(2)
NPP in the YRB is higher in the south and lower in the north, and NPP in most areas increases in different degrees during the study period.
(3)
For the entire YRB, moisture-related variables contributed more strongly to model-predicted NPP variations than thermal variables. For different ecosystem types, the main contributing factors varied. Surface solar radiation downwards (SSRD) made the largest contribution to model-predicted NPP variations in cropland and forest ecosystems and was generally negatively associated with NPP. In contrast, evapotranspiration (E) contributed most strongly to model-predicted NPP variations in grassland ecosystems and was positively associated with NPP. The SHAP dependence results further indicated that the relationships between NPP and environmental factors were nonlinear and ecosystem-dependent. The identified thresholds should be interpreted as SHAP-derived response thresholds within the machine learning framework rather than direct evidence of ecological tipping points or regime shifts.
This study still has several limitations. First, MODIS NPP products may contain uncertainties related to remote sensing retrieval algorithms, spatial resolution, and data aggregation processes. Second, the machine learning models mainly considered climatic and soil variables, while direct indicators of human activities, vegetation management, and ecological restoration intensity were not fully incorporated. Third, the thresholds identified from SHAP response curves should be interpreted as model-based nonlinear response points rather than direct evidence of ecological regime shifts. Future studies should integrate field observations, higher-resolution remote sensing products, spatially explicit validation strategies, and scenario-based simulations to further test the stability, transferability, and ecological meaning of the threshold relationships identified in this study.

Author Contributions

Conceptualization, Y.L. and Z.Z.; methodology, Y.L. and Z.Z.; software, Y.L.; validation, D.X. and X.D.; formal analysis, Y.L.; investigation, Y.L.; resources, Z.Z.; data curation, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, Z.Z.; visualization, Y.L.; supervision, Z.Z.; project administration, Z.Z.; funding acquisition, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Social Science Fund of China with grant number 21BGL026.

Data Availability Statement

The data presented in this study are openly available. The MODIS NPP data can be accessed from the NASA EOSDIS Land Processes DAAC (https://lpdaac.usgs.gov (accessed on 25 February 2026)). The land use data are available from Zenodo (https://zenodo.org (accessed on 25 February 2026)). The ERA5-Land climate data can be accessed from the Copernicus Climate Change Service (C3S) Climate Data Store (https://cds.climate.copernicus.eu (accessed on 25 February 2026)). Detailed citation information for each dataset is provided in the reference list.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Field, C.B.; Behrenfeld, M.J.; Randerson, J.T.; Falkowski, P. Primary production of the biosphere: Integrating terrestrial and oceanic components. Science 1998, 281, 237–240. [Google Scholar] [CrossRef]
  2. Luo, Z.; Wu, W.; Yu, X.; Song, Q.; Yang, J.; Wu, J.; Zhang, H. Variation of Net Primary Production and Its Correlation with Climate Change and Anthropogenic Activities over the Tibetan Plateau. Remote Sens. 2018, 10, 1352. [Google Scholar] [CrossRef]
  3. Zhang, S.; Chen, Y.; Guo, H.; Lu, Y.; Guo, X.; Liu, C.; Zhou, X.; Zhang, Y. Changes in dryland areas and net primary productivity in China from 1980 to 2020. J. Earth Syst. Sci. 2023, 132, 83. [Google Scholar] [CrossRef]
  4. Sun, R.; Wang, J.; Xiao, Z.; Zhu, A.; Wang, M.; Yu, T. Estimation of global net primary productivity from 1981 to 2018 with remote sensing data. In IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium; IEEE: Piscataway, NJ, USA, 2020; pp. 4331–4334. [Google Scholar] [CrossRef]
  5. Zhao, M.S.; Heinsch, F.A.; Nemani, R.R.; Running, S.W. Improvements of the MODIS terrestrial gross and net primary production global data set. Remote Sens. Environ. 2005, 95, 164–176. [Google Scholar] [CrossRef]
  6. Castaneda-Moya, E.; Twilley, R.R.; Rivera-Monroy, V.H. Allocation of biomass and net primary productivity of mangrove forests along environmental gradients in the Florida Coastal Everglades, USA. For. Ecol. Manag. 2013, 307, 226–241. [Google Scholar] [CrossRef]
  7. Wen, Y.; Liu, X.; Bai, Y.; Sun, Y.; Yang, J.; Lin, K.; Pei, F.; Yan, Y. Determining the impacts of climate change and urban expansion on terrestrial net primary production in China. J. Environ. Manag. 2019, 240, 75–83. [Google Scholar] [CrossRef] [PubMed]
  8. Sun, H.; Wang, W.J.; Liu, Z.; Ballantyne, A.P.; Yu, K.; Bao, S.G.; Ba, S.; Wang, L.; Cong, Y.; He, H.S. Enhanced productivity and evapotranspiration dominated by woody plant encroachment-induced vegetation greening in boreal wetland ecosystems. Gisci. Remote Sens. 2024, 61, 2391144. [Google Scholar] [CrossRef]
  9. Aubinet, M.; Vesala, T.; Papale, D. (Eds.) Eddy Covariance: A Practical Guide to Measurement and Data Analysis; Springer: Dordrecht, The Netherlands, 2012. [Google Scholar] [CrossRef]
  10. Ochoa-Sanchez, A.; Crespo, P.; Carrillo-Rojas, G.; Sucozhanay, A.; Celleri, R. Actual Evapotranspiration in the High Andean Grasslands: A Comparison of Measurement and Estimation Methods. Front. Earth Sci. 2019, 7, 55. [Google Scholar] [CrossRef]
  11. Guo, H.; Zhou, X.; Dong, Y.; Wang, Y.; Li, S. On the use of machine learning methods to improve the estimation of gross primary productivity of maize field with drip irrigation. Ecol. Model. 2023, 476, 110250. [Google Scholar] [CrossRef]
  12. Potter, C.S.; Randerson, J.T.; Field, C.B.; Matson, P.A.; Vitousek, P.M.; Mooney, H.A.; Klooster, S.A. Terrestrial ecosystem production: A process model based on global satellite and surface data. Glob. Biogeochem. Cycles 1993, 7, 811–841. [Google Scholar] [CrossRef]
  13. Zheng, Z.; Zhu, W.; Zhang, Y. Seasonally and spatially varied controls of climatic factors on net primary productivity in alpine grasslands on the Tibetan Plateau. Glob. Ecol. Conserv. 2020, 21, e00814. [Google Scholar] [CrossRef]
  14. Bai, Y.; Liang, S.; Yuan, W. Estimating Global Gross Primary Production from Sun-Induced Chlorophyll Fluorescence Data and Auxiliary Information Using Machine Learning Methods. Remote Sens. 2021, 13, 963. [Google Scholar] [CrossRef]
  15. Liang, Y.; Zhao, H.; Yuan, Z.; Wei, D.; Wang, X. Ecological Restoration Projects Adapt Response of Net Primary Productivity of Alpine Grasslands to Climate Change across the Tibetan Plateau. Remote Sens. 2024, 16, 4444. [Google Scholar] [CrossRef]
  16. Yaghmaei, L.; Koupaei, S.S.; Jafari, R. Spatiotemporal Response of Rangeland NPP to Drought in Central Iran based on SPDI Index. Contemp. Probl. Ecol. 2020, 13, 694–707. [Google Scholar] [CrossRef]
  17. Potter, C.; Pass, S. Changes in the net primary production of ecosystems across Western Europe from 2015 to 2022 in response to historic drought events. Carbon Balance Manag. 2024, 19, 32. [Google Scholar] [CrossRef] [PubMed]
  18. Berberoglu, S.; Donmez, C.; Cilek, A. Modelling climate change impacts on regional net primary productivity in Turkey. Environ. Monit. Assess. 2021, 193, 242. [Google Scholar] [CrossRef]
  19. Sun, J.; Yue, Y.; Niu, H. Evaluation of NPP using three models compared with MODIS-NPP data over China. PLoS ONE 2021, 16, e0252149. [Google Scholar] [CrossRef]
  20. Hicke, J.A.; Asner, G.P.; Randerson, J.T.; Tucker, C.; Los, S.; Birdsey, R.; Jenkins, J.C.; Field, C. Trends in North American net primary productivity derived from satellite observations, 1982–1998. Glob. Biogeochem. Cycles 2002, 16, 2-1–2-14. [Google Scholar] [CrossRef]
  21. Liu, J.; Shen, L.; Chen, Z.; Ni, J.; Huang, Y. Assessing the Response of the Net Primary Productivity to Snow Phenology Changes in the Tibetan Plateau: Trends and Environmental Drivers. Remote Sens. 2024, 16, 3566. [Google Scholar] [CrossRef]
  22. Duan, Y.; Pei, X.; Luo, J.; Zhang, X.; Luo, L. Disentangling the Spatiotemporal Dynamics, Drivers, and Recovery of NPP in Co-Seismic Landslides: A Case Study of the 2017 Jiuzhaigou Earthquake, China. Forests 2024, 15, 1381. [Google Scholar] [CrossRef]
  23. Liu, Y.; Huang, C.; Chen, C.; Yang, C.; Huang, W. Spatiotemporal Variation and Driving Factors Analysis of Net Primary Productivity in the Qinling Mountains. Contemp. Probl. Ecol. 2024, 17, 936–947. [Google Scholar] [CrossRef]
  24. Gong, E.; Zhang, J.; Wang, Z.; Wang, J. Estimating the dynamics and driving factors of gross primary productivity over the Chinese Loess Plateau by the modified vegetation photosynthesis model. Ecol. Inform. 2024, 83, 102838. [Google Scholar] [CrossRef]
  25. Baldocchi, D.; Chu, H.; Reichstein, M. Inter-annual variability of net and gross ecosystem carbon fluxes: A review. Agric. For. Meteorol. 2018, 249, 520–533. [Google Scholar] [CrossRef]
  26. Liu, Q.; Fu, Y.H.; Zhu, Z.; Liu, Y.; Liu, Z.; Huang, M.; Janssens, I.A.; Piao, S. Delayed autumn phenology in the Northern Hemisphere is related to change in both climate and spring phenology. Glob. Change Biol. 2016, 22, 3702–3711. [Google Scholar] [CrossRef]
  27. Zeppel, M.J.B.; Wilks, J.V.; Lewis, J.D. Impacts of extreme precipitation and seasonal changes in precipitation on plants. Biogeosciences 2014, 11, 3083–3093. [Google Scholar] [CrossRef]
  28. Carrillo-Rojas, G.; Silva, B.; Rollenbeck, R.; Celleri, R.; Bendix, J. The breathing of the Andean highlands: Net ecosystem exchange and evapotranspiration over the paramo of southern Ecuador. Agric. For. Meteorol. 2019, 265, 30–47. [Google Scholar] [CrossRef]
  29. Urgiles, C.; Orellana-Alvear, J.; Crespo, P.; Carrillo-Rojas, G. Gross primary productivity estimation through remote sensing and machine learning techniques in the high Andean Region of Ecuador. Int. J. Biometeorol. 2024, 69, 541–556. [Google Scholar] [CrossRef] [PubMed]
  30. Wang, M.; Sun, K.; Jia, J.; Wu, F.; Gao, Y. Climate Change Drove the Decline in Yangtze Estuary Net Primary Production over the Past Two Decades. Environ. Sci. Technol. 2024, 58, 19305–19314. [Google Scholar] [CrossRef]
  31. Yuan, D.; Zhang, S.; Li, H.; Zhang, J.; Yang, S.; Bai, Y. Improving the Gross Primary Productivity Estimate by Simulating the Maximum Carboxylation Rate of the Crop Using Machine Learning Algorithms. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4413115. [Google Scholar] [CrossRef]
  32. Zhang, S.; Hao, X.; Zhao, Z.; Zhang, J.; Fan, X.; Li, X. Natural Vegetation Succession Under Climate Change and the Combined Effects on Net Primary Productivity. Earths Future 2023, 11, e2023EF003903. [Google Scholar] [CrossRef]
  33. Li, M.; Zhu, Z.; Ren, W.; Wang, Y. Predicting Gross Primary Productivity under Future Climate Change for the Tibetan Plateau Based on Convolutional Neural Networks. Remote Sens. 2024, 16, 3723. [Google Scholar] [CrossRef]
  34. Liu, N.; Hao, Z.; Zhao, P. Explainable deep learning insights into the history and future of net primary productivity in China. Ecol. Indic. 2024, 166, 112394. [Google Scholar] [CrossRef]
  35. Lu, Q.; Liu, H.; Wei, L.; Zhong, Y.; Zhou, Z. Global prediction of gross primary productivity under future climate change. Sci. Total Environ. 2024, 912, 169239. [Google Scholar] [CrossRef]
  36. Jia, L.; Zhang, B. Simulating the Vegetation Gross Primary Productivity by the Biome-BGC Model in the Yellow River Basin of China. Water 2024, 16, 3468. [Google Scholar] [CrossRef]
  37. Lv, M.; Ma, Z.; Li, M.; Zheng, Z. Quantitative Analysis of Terrestrial Water Storage Changes Under the Grain for Green Program in the Yellow River Basin. J. Geophys. Res.-Atmos. 2019, 124, 1336–1351. [Google Scholar] [CrossRef]
  38. Jiang, C.; Guo, H.; Wei, Y.; Yang, Z.; Wang, X.; Wen, M.; Yang, L.; Zhao, L.; Zhang, H.; Zhou, P. Ecological restoration is not sufficient for reconciling the trade-off between soil retention and water yield: A contrasting study from catchment governance perspective. Sci. Total Environ. 2021, 754, 142139. [Google Scholar] [CrossRef]
  39. Wang, X.; He, W.; Huang, Y.; Wu, X.; Zhang, X.; Zhang, B. Exploring Spatial Non-Stationarity and Scale Effects of Natural and Anthropogenic Factors on Net Primary Productivity of Vegetation in the Yellow River Basin. Remote Sens. 2024, 16, 3156. [Google Scholar] [CrossRef]
  40. Li, X.; Yu, K.-X.; Xu, G.-C.; Li, P.; Li, Z.-B.; Shi, P. Spatial Characteristics and Driving Factors of Net Primary Productivity of Vegetation in the Upper and Middle Yellow River Basin. Huan Jing Ke Xue 2024, 45, 6448–6457. [Google Scholar] [CrossRef] [PubMed]
  41. Tian, K.; Liu, X.; Zhang, B.; Wang, Z.; Xu, G.; Chang, K.; Xu, P.; Han, B. Analysis of Spatiotemporal Evolution and Influencing Factors of Vegetation Net Primary Productivity in the Yellow River Basin from 2000 to 2022. Sustainability 2024, 16, 381. [Google Scholar] [CrossRef]
  42. Cao, Y.; Li, H.; Liu, Y.; Zhang, Y.; Jiang, Y.; Dai, W.; Shen, M.; Guo, X.; Qi, W.; Li, L.; et al. Regional Contribution and Attribution of the Interannual Variation of Net Primary Production in the Yellow River Basin, China. Remote Sens. 2023, 15, 5212. [Google Scholar] [CrossRef]
  43. Lin, Z.; Liu, Y.; Wen, Z.; Chen, X.; Han, P.; Zheng, C.; Yao, H.; Wang, Z.; Shi, H. Spatial-Temporal Variation Characteristics and Driving Factors of Net Primary Production in the Yellow River Basin over Multiple Time Scales. Remote Sens. 2023, 15, 5273. [Google Scholar] [CrossRef]
  44. Wu, Z.; Borzee, A.; Qian, T.; Dai, W.; Li, S.; Wang, J. Spatial non-stationarity effect of determinants regulates variation in amphibian species richness. Ecol. Indic. 2023, 150, 110268. [Google Scholar] [CrossRef]
  45. Das, P.; Zhang, Z.; Ghosh, S.; Lu, J.; Ayugi, B.; Ojara, M.A.; Guo, X. Historical and projected changes in Extreme High Temperature events over East Africa and associated with meteorological conditions using CMIP6 models. Glob. Planet. Change 2023, 222, 104068. [Google Scholar] [CrossRef]
  46. Theil, H. A rank-invariant method of linear and polynomial regression analysis. In Henri Theil’s Contributions to Economics and Econometrics. Advanced Studies in Theoretical and Applied Econometrics; Springer: Dordrecht, The Netherlands, 1992. [Google Scholar]
  47. Sen, P.K. Estimates of the regression coefficient based on Kendall’s tau. J. Am. Stat. Assoc. 1968, 63, 1379–1389. [Google Scholar] [CrossRef]
  48. Kendall, M.G. A new measure of rank correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
  49. Chen, Y.; Guo, D.; Cao, W.; Li, Y. Changes in Net Primary Productivity and Factor Detection in China’s Yellow River Basin from 2000 to 2019. Remote Sens. 2023, 15, 2798. [Google Scholar] [CrossRef]
  50. Movchan, D.; Kostyuchenko, Y.V. Regional dynamics of terrestrial vegetation productivity and climate feedbacks for territory of Ukraine. Int. J. Geogr. Inf. Sci. 2015, 29, 1490–1505. [Google Scholar] [CrossRef]
  51. Nie, W.; Gu, J.; Li, B.; Wen, X.; Nie, X. Quantitative lithology prediction from seismic data using deep learning. Comput. Geosci. 2025, 196, 105821. [Google Scholar] [CrossRef]
  52. Rodriguez-Galiano, V.; Sanchez-Castillo, M.; Chica-Olmo, M.; Chica-Rivas, M. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 2015, 71, 804–818. [Google Scholar] [CrossRef]
  53. Su, N.; Weng, S.; Wang, L.; Xu, T. Reflectance Spectroscopy with Multivariate Methods for Non-Destructive Discrimination of Edible Oil Adulteration. Biosensors 2021, 11, 492. [Google Scholar] [CrossRef]
  54. Qi, Y.-P.; He, P.-J.; Lan, D.-Y.; Lue, F.; Zhang, H. Novel method for predicting concentrations of incineration flue gas based on waste composition and machine learning. J. Environ. Manag. 2025, 373, 123588. [Google Scholar] [CrossRef] [PubMed]
  55. Yan, Y.; Yang, Y. Revealing the synergistic spatial effects in soil heavy metal pollution with explainable machine learning models. J. Hazard. Mater. 2024, 482, 136578. [Google Scholar] [CrossRef]
  56. Wang, Z.; Zhou, R.; Rui, J.; Yu, Y. Revealing the impact of urban spatial morphology on land surface temperature in plain and plateau cities using explainable machine learning. Sustain. Cities Soc. 2025, 118, 106046. [Google Scholar] [CrossRef]
  57. Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
  58. Zhang, L.; Li, X.; Liu, X.; Lian, Z.; Zhang, G.; Liu, Z.; An, S.; Ren, Y.; Li, Y.; Liu, S. Dynamic monitoring and drivers of ecological environmental quality in the Three-North region, China: Insights based on remote sensing ecological index. Ecol. Inform. 2025, 85, 102936. [Google Scholar] [CrossRef]
  59. Dai, L.; Li, S.; Zhou, W.; Qi, L.; Zhou, L.; Wei, Y.; Li, J.; Shao, G.; Yu, D. Opportunities and challenges for the protection and ecological functions promotion of natural forests in China. For. Ecol. Manag. 2018, 410, 187–192. [Google Scholar] [CrossRef]
  60. Liu, Y.; Zhang, T.; Yuan, L.; Zimini, Y.; Zhou, R.; Lin, Z.; Zheng, C.; Wen, Z. Interannual asymmetric transitions of gross primary productivity in the grasslands of Northern China. Ecol. Indic. 2024, 167, 112631. [Google Scholar] [CrossRef]
Figure 1. Overview of the study area. Note: This map is based on the standard map from the Standard Map Service System of the Ministry of Natural Resources of China (Approval No.: GS(2019)1822). The map boundaries remain unaltered.
Figure 1. Overview of the study area. Note: This map is based on the standard map from the Standard Map Service System of the Ministry of Natural Resources of China (Approval No.: GS(2019)1822). The map boundaries remain unaltered.
Forests 17 00674 g001
Figure 2. (a) Temporal variations in NPP from 2001 to 2020; (b) land use in the YRB in 2001; (c) percentage of various land use types in the YRB in 2001; (d) land use in the YRB in 2020; (e) percentage of each land use type in the YRB in 2020.
Figure 2. (a) Temporal variations in NPP from 2001 to 2020; (b) land use in the YRB in 2001; (c) percentage of various land use types in the YRB in 2001; (d) land use in the YRB in 2020; (e) percentage of each land use type in the YRB in 2020.
Forests 17 00674 g002
Figure 3. (a) Spatial distribution of mean NPP in the YRB from 2001 to 2020; (b) NPP change trend from 2001 to 2020.
Figure 3. (a) Spatial distribution of mean NPP in the YRB from 2001 to 2020; (b) NPP change trend from 2001 to 2020.
Forests 17 00674 g003
Figure 4. Spatial distribution of the environmental factors used in the NPP modeling during 2001–2020: (a) mean annual temperature (TEM); (b) total precipitation (TP); (c) soil temperature level 1 at 0–7 cm (STL); (d) evapotranspiration (E); (e) surface pressure (SP); (f) surface solar radiation downwards (SSRD); and (g) volumetric soil water layer 1 at 0–7 cm (SWVL).
Figure 4. Spatial distribution of the environmental factors used in the NPP modeling during 2001–2020: (a) mean annual temperature (TEM); (b) total precipitation (TP); (c) soil temperature level 1 at 0–7 cm (STL); (d) evapotranspiration (E); (e) surface pressure (SP); (f) surface solar radiation downwards (SSRD); and (g) volumetric soil water layer 1 at 0–7 cm (SWVL).
Forests 17 00674 g004
Figure 5. Spatial distribution of the pixel-wise Pearson correlations between NPP and environmental factors from 2001 to 2020: (a) TEM; (b) TP; (c) STL; (d) E; (e) SP; (f) SSRD; and (g) SWVL. Warm colors indicate positive correlations, whereas cool colors indicate negative correlations.
Figure 5. Spatial distribution of the pixel-wise Pearson correlations between NPP and environmental factors from 2001 to 2020: (a) TEM; (b) TP; (c) STL; (d) E; (e) SP; (f) SSRD; and (g) SWVL. Warm colors indicate positive correlations, whereas cool colors indicate negative correlations.
Forests 17 00674 g005
Figure 6. Importance of each factor for the NPP regression model. (ac) Importance of each factor for the overall sample of the YRB; (df) importance of each factor for the grassland sample of the YRB; (gi) importance of each factor for the forest sample of the YRB; (jl) importance of each factor for the cropland sample of the YRB.
Figure 6. Importance of each factor for the NPP regression model. (ac) Importance of each factor for the overall sample of the YRB; (df) importance of each factor for the grassland sample of the YRB; (gi) importance of each factor for the forest sample of the YRB; (jl) importance of each factor for the cropland sample of the YRB.
Forests 17 00674 g006
Figure 7. (a) Contribution of various factors to the NPP of cropland ecosystems. (bh) Response of various factors to NPP in cropland ecosystems.
Figure 7. (a) Contribution of various factors to the NPP of cropland ecosystems. (bh) Response of various factors to NPP in cropland ecosystems.
Forests 17 00674 g007
Figure 8. (a) SHAP-based contributions of environmental factors to the fitted NPP model for forest ecosystems; (bh) SHAP dependence relationships between environmental factors and predicted NPP in forest ecosystems.
Figure 8. (a) SHAP-based contributions of environmental factors to the fitted NPP model for forest ecosystems; (bh) SHAP dependence relationships between environmental factors and predicted NPP in forest ecosystems.
Forests 17 00674 g008
Figure 9. (a) SHAP-based contributions of environmental factors to the fitted NPP model for grassland ecosystems; (bh) SHAP dependence relationships between environmental factors and predicted NPP in grassland ecosystems.
Figure 9. (a) SHAP-based contributions of environmental factors to the fitted NPP model for grassland ecosystems; (bh) SHAP dependence relationships between environmental factors and predicted NPP in grassland ecosystems.
Forests 17 00674 g009
Table 1. List of impact factors.
Table 1. List of impact factors.
AbbreviationFull Name
TEMTemperature/°C
TPTotal precipitation/mm
EEvapotranspiration/mm
STLSoil temperature level 1 (0–7 cm)/°C
SPSurface pressure/hPa
SSRDSurface solar radiation downwards/MJ·m−2
SWVLVolumetric soil water layer 1 (0–7 cm)/m3·m−3
Table 2. Classification of change trends.
Table 2. Classification of change trends.
Slope Z M K Trend
Slope > 0 1.96   <   Z M K Significantly increase
0   <   Z M K ≤ 1.96Slightly increase
Slope = 0 1.96     Z M K ≤ 1.96Stable
Slope < 0 1.96     Z M K < 0Slightly decrease
Z M K < −1.96Significantly decrease
Table 3. Hyperparameter settings of the machine learning models.
Table 3. Hyperparameter settings of the machine learning models.
ModelHyperparameterValue
AdaBoostn_estimators100
AdaBoostlearning_rate0.1
AdaBoostlosslinear
Random Forestn_estimators200
Random Forestmax_depth15
Random Forestmin_samples_split5
Random Forestmin_samples_leaf2
Random Forestmax_featuressqrt
XGBoostn_estimators200
XGBoostlearning_rate0.05
XGBoostmax_depth8
XGBoostsubsample0.8
XGBoostcolsample_bytree0.8
XGBoostobjectivereg:squarederror
Table 4. Changes in land use area and percentage in the Yellow River Basin from 2001 to 2020.
Table 4. Changes in land use area and percentage in the Yellow River Basin from 2001 to 2020.
Land Use TypeArea in 2001/km2Percentage in 2001/%Area in 2020/km2Percentage in 2020/%Change/km2
Grassland458,312.5257.49461,940.3557.953627.82
Cropland201,797.2225.31184,798.5323.18−16,998.69
Forest79,052.529.9292,316.9711.5813,264.45
Bare land34,757.234.3625,141.493.15−9615.74
Impervious surface12,786.211.6022,224.302.799438.09
Shrub5283.690.663772.020.47−1511.67
Water4711.290.596268.320.791557.02
Snow/Ice286.610.04249.400.03−37.21
Wetland185.320.02461.240.06275.92
Note: Change was calculated as the land use area in 2020 minus that in 2001. This table summarizes net area changes rather than detailed land conversion pathways.
Table 5. Pearson correlation coefficients between NPP and environmental factors in the Yellow River Basin and different ecosystem types.
Table 5. Pearson correlation coefficients between NPP and environmental factors in the Yellow River Basin and different ecosystem types.
FactorEntire YRBCroplandForestGrassland
TEM0.320.280.350.26
TP0.18−0.05−0.120.22
E0.410.350.300.48
STL0.290.250.330.23
SP−0.15−0.18−0.10−0.20
SSRD−0.38−0.45−0.42−0.33
SWVL0.350.220.280.40
Note: TEM, temperature; TP, total precipitation; E, evapotranspiration; STL, soil temperature level 1; SP, surface pressure; SSRD, surface solar radiation downwards; SWVL, volumetric soil water layer 1.
Table 6. Fitting effects of three machine learning algorithms.
Table 6. Fitting effects of three machine learning algorithms.
TypeRandom ForestXGBoostAdaBoost
R2RMSER2RMSER2RMSE
Total samples0.84243.40.83544.40.84443.1
Grass samples0.83149.30.81951.10.83349.0
Crop samples0.75762.30.74563.80.76461.3
Forest samples0.60082.20.59882.00.61580.2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lan, Y.; Zheng, Z.; Xie, D.; Ding, X. Exploring the Critical Thresholds of Environmental Factors on Net Primary Productivity in the Yellow River Basin. Forests 2026, 17, 674. https://doi.org/10.3390/f17060674

AMA Style

Lan Y, Zheng Z, Xie D, Ding X. Exploring the Critical Thresholds of Environmental Factors on Net Primary Productivity in the Yellow River Basin. Forests. 2026; 17(6):674. https://doi.org/10.3390/f17060674

Chicago/Turabian Style

Lan, Yu, Zhaopei Zheng, Dewei Xie, and Xin Ding. 2026. "Exploring the Critical Thresholds of Environmental Factors on Net Primary Productivity in the Yellow River Basin" Forests 17, no. 6: 674. https://doi.org/10.3390/f17060674

APA Style

Lan, Y., Zheng, Z., Xie, D., & Ding, X. (2026). Exploring the Critical Thresholds of Environmental Factors on Net Primary Productivity in the Yellow River Basin. Forests, 17(6), 674. https://doi.org/10.3390/f17060674

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop