1. Introduction
The United Nations Sustainable Development Goals (SDGs) highlight the importance of harmonious human–nature relationships, in which ecosystem service (ES) play a fundamental role by linking natural systems with human well-being [
1,
2]. The sustainable supply of ES is therefore essential for achieving multiple SDGs and depends on effective ecosystem conservation and management [
3]. ES functions are shaped by both natural factors and socio-economic drivers that modify land use and ecosystem processes [
4,
5]. Rapid socio-economic development and intensified human activities have increased the risk of ES degradation, threatening their sustainability and human well-being [
6]. Over the past two decades, ES assessment models have been widely used to quantify the spatiotemporal dynamics of critical ecological functions. In arid and semi-arid regions, these assessments predominantly focus on habitat quality (HQ), which reflects the capacity of regional ecosystems to provide suitable conditions for biodiversity [
7]; soil conservation (SC), representing the ecosystem’s ability to control water erosion and retain vital nutrients [
8]; windbreak and sand fixation (WS), a quintessential service for mitigating wind erosion and preventing desertification [
9]; and water yield (WY), which quantifies the net freshwater provisioning capacity essential for maintaining both natural riparian habitats and oasis agriculture [
10]. Building on these foundational assessments, statistical and spatial models have been applied to explore the relationships between these ESs and natural and socio-economic drivers, including climate, vegetation, hydrology, and human activities [
11].
In the context of agricultural science and regional land management, particularly in arid and semi-arid environments, the state-of-the-art understanding of ESs is deeply intertwined with agricultural expansion and water resource allocation [
12]. Recent agronomic studies emphasize that while expanding oasis agriculture enhances food provision and regional socio-economic development, it simultaneously intensifies the competition for limited water resources. This dynamic frequently triggers a “crowding-out effect” on ecological water, heavily impacting regulating and supporting services such as soil conservation and habitat quality in the oasis–desert transition zones [
13]. Therefore, understanding the delicate theoretical balance between agricultural pressures and ecosystem stability has become a core focus within current agronomic and ecological research, necessitating precise analytical frameworks to guide sustainable farming and land-use policies [
14].
Despite important methodological advances, previous studies have made substantial progress in identifying the drivers of ESs, and recent research has increasingly incorporated causal analysis frameworks to better distinguish causal relationships from simple correlations. For example, Liang [
15] explored causal pathways between ecological environmental quality and its driving factors, while Yang [
16] analyzed causal relationships and interactions between ESs and their influencing factors in arid regions. However, despite these advances, many existing studies still focus on individual components of ES dynamics rather than integrating causally informed feature selection, interaction effects, and nonlinear responses into a unified analytical framework. This separation may limit the mechanistic understanding of coupled natural–social processes shaping ES dynamics. In addition, the frequent reliance on single-factor or additive frameworks overlooks interaction effects, even though ESs are typically shaped by coupled natural–social processes [
17]. As ES research has advanced, the limitations of traditional statistical models in capturing complex nonlinear relationships have become increasingly apparent [
18]. Consequently, machine learning approaches have been widely adopted for their ability to model nonlinear responses, and recent advances in interpretable machine learning have enhanced transparency and facilitated the identification of key drivers and mechanisms [
19]. Although nonlinear responses of ESs to their drivers are widely recognized, their quantitative characterization remains limited. Such nonlinearity implies the existence of critical thresholds, beyond which small changes in driving factors can trigger abrupt shifts in ES functions [
20]. Various methods have been used to identify ES thresholds, including piecewise linear regression, elasticity-based approaches, and constraint line methods [
21,
22].
In arid regions, ESs are particularly prone to nonlinear and threshold-type responses due to limited water availability, fragile ecological structures, and intensified human–environment interactions. For instance, vegetation productivity often shows rapid initial growth under increasing precipitation but gradually approaches saturation, while excessive population pressure may trigger abrupt declines in water-related ESs. These characteristics highlight the importance of explicitly detecting threshold behavior in arid landscapes. Given these characteristics, piecewise linear regression provides an effective analytical tool for detecting threshold points and characterizing nonlinear transitions between ESs and their drivers.
Arid region ecosystems are among the most fragile globally, as their structure and functions are strongly constrained by water scarcity and human disturbance. Under climate change, ESs in arid and semi-arid regions exhibit heightened sensitivity and increasingly complex responses, while existing studies have mainly focused on land-use effects and service trade-offs [
23,
24]. Owing to strong natural–socioeconomic coupling, ESs in arid regions are particularly prone to nonlinear and threshold responses. Identifying such thresholds is therefore critical for understanding human–environment interactions and supporting sustainable management in arid regions. The Tarim River Basin, a typical “mountain–oasis–desert” system in arid inland China, relies heavily on snow- and glacier-melt water and faces intense competition between agricultural and ecological water demands. These characteristics create strong human–environment coupling and make the basin an ideal case for investigating nonlinear and threshold responses of ESs in arid regions. In summary, three key limitations remain in existing studies: (1) insufficient causal foundations in driver selection, (2) inadequate representation of natural–social interactions, and (3) limited management relevance of threshold identification results.
To address these gaps, this study uses the Tarim River Basin as a case and develops an integrated framework combining ES assessment, causal screening, interaction modeling, interpretable machine learning, and segmented regression. This framework aims to identify key interaction effects and quantify ES thresholds with causally informed screening, thereby supporting differentiated ecosystem management in arid regions.
Based on the above research gaps, this study aims to address the following research questions: (1) Which natural and socioeconomic factors exert causally robust influences on ES functions in arid regions? (2) How do natural–socioeconomic interaction terms shape the nonlinear responses of ESs? (3) What are the critical threshold intervals at which ESs shift under coupled natural–social drivers? (4) How can the identified thresholds support differentiated ecosystem management in arid regions? By addressing these questions, this study develops an integrated analytical framework that combines causal inference, interpretable machine learning, and piecewise regression to reveal the nonlinear threshold behavior of ESs.
2. Materials and Methods
2.1. Study Area
The Tarim River Basin, located in southern Xinjiang, is the largest endorheic river basin in China, covering approximately 1.02 × 106 km2. Because of its vast territorial extent, the geomorphological and geohydrological conditions within the basin are extremely non-uniform, exhibiting profound spatial heterogeneity. It is bounded by the Tianshan, Kunlun, and Altun Mountains, with the Taklimakan Desert occupying its central area, forming a typical and highly contrasting “mountain–oasis–desert” composite ecological pattern. The basin experiences an extremely arid continental climate, with annual precipitation generally below 100 mm and potential evapotranspiration ranging from 800 to 2000 mm. River runoff is mainly sustained by glacier and snowmelt from surrounding mountains. Hydrologically, this water flows downwards to sustain the geohydrologically active mid-stream piedmont oases before ultimately dissipating into the hyper-arid central desert, where groundwater becomes exceptionally deep. This steep spatial gradient results in a pronounced imbalance between water availability and heat conditions across the basin.
Under these constraints, oases are highly fragmented and fragile, and the ecosystem is particularly sensitive to both human activities and climate variability. In recent years, intensified land use, expanding irrigation demand, and ongoing climate change have driven significant spatiotemporal variations and nonlinear responses of ES functions within the basin. For example, as agricultural water extraction gradually increases, natural vegetation at the oasis margins may initially sustain itself using residual groundwater, resulting in only minor declines in habitat quality. However, once irrigation demand exceeds a critical threshold and depletes these limited reserves, the ecosystem loses its buffering capacity. This triggers an abrupt and steep collapse in natural plant communities and soil conservation functions. These responses are fundamentally non-linear because they are governed by finite physical limits and resource tipping points rather than proportional, steady changes. Owing to its distinctive natural setting and strong human–environment coupling, the Tarim River Basin serves as a representative case for investigating ES driving mechanisms and threshold identification in arid regions.
To better position the Tarim River Basin within the broader context of global arid environments, it is necessary to clarify its generic and idiosyncratic characteristics. Similar to other major arid agricultural zones like the Aral Sea Basin, this region experiences extreme water scarcity and depends heavily on fragmented oasis agriculture. Furthermore, it exhibits profound ecological vulnerability to human activities, particularly population growth and irrigation expansion. These shared traits ensure that the integrated methodological framework developed in this study is highly transferable. However, the Tarim River Basin possesses distinct idiosyncratic features. It is a massive endorheic basin with a hydrological system almost entirely driven by high-altitude glacier and snowmelt, which differs significantly from regions relying on monsoonal precipitation or extensive fossil groundwater extraction. Consequently, while the analytical approach and the general observation of human water demand displacing ecological needs are globally relevant, the specific numerical thresholds derived in this study remain contextual and are primarily applicable to meltwater-driven oasis–desert systems (
Figure 1).
2.2. Data Source
This study utilized multi-source datasets (
Table 1). To ensure spatial consistency, data from different sources were resampled to a uniform spatial resolution of 1 km using the nearest neighbor assignment method in ArcGIS 10.8, and subsequently projected to the WGS_1984_Albers coordinate system. This method preserves the original pixel values without introducing interpolated artifacts, making it particularly suitable for maintaining the accuracy of discrete categorical data during the upscaling process.
2.3. Research Framework
ES responses to environmental change and human activities are often complex and nonlinear, particularly under the coupling of multiple driving forces, where threshold effects cannot be adequately captured by correlation-based analyses alone. To address these challenges, this study develops an integrated analytical framework consisting of five sequential and interlinked steps (
Figure 2): ES assessment, causal screening, interaction term construction, interpretable machine learning, and segmented regression.
First, ESs in the study area were assessed for the years 2000, 2010, and 2023, including HQ, SC, WS, and WY, providing the basis for analyzing spatiotemporal changes and identifying services sensitive to environmental and human drivers. Second, causal inference methods were applied to screen key driving factors with directional and robust impacts on ESs. In this step, the PC algorithm was utilized for causal structure learning, while the DoWhy potential-outcome framework was employed for effect estimation and refutation. This procedure ensures that subsequent analyses focus on drivers with plausible causal relevance. Third, interaction terms between natural and socio-economic factors were constructed based on the screened drivers to capture compound and non-additive effects that cannot be revealed by single-factor models. Fourth, interpretable machine learning models were employed to characterize the nonlinear responses of ESs to these interaction terms. Specifically, the eXtreme Gradient Boosting algorithm coupled with SHapley Additive exPlanations was utilized. This particular combination was selected because it effectively captures complex, high-dimensional nonlinear relationships without requiring predefined functional forms. Furthermore, it provides transparent and quantitative interpretations of feature importance and functional response patterns. Finally, a sequential combination of machine learning and statistical modeling was used to quantify threshold intervals. The XGBoost and SHAP analysis first identifies the most dominant interaction terms and generates specific data points representing their nonlinear response curves. Subsequently, a three-segment piecewise linear regression is fitted directly to these extracted SHAP dependence relationships. This subsequent mathematical procedure calculates the exact breakpoint values where the slope of the ecosystem response undergoes significant directional or magnitude shifts. Through this structured sequential process, the complex nonlinear patterns visually revealed by the machine learning models are systematically translated into explicit and quantifiable threshold intervals. Therefore, the conceptual thresholds illustrated in
Figure 2 are not qualitative estimates; rather, they concretely represent these statistically optimized breakpoints calculated via the least-squares method.
Together, these steps form a coherent and sequential framework in which each component builds on the outputs of the previous one, linking causal driver identification, interaction effects, nonlinear response characterization, and threshold extraction to support ecological risk warning and ecosystem management in arid regions.
2.4. Research Methods
2.4.1. Assessment Methods of ES Functions
Considering the ecological fragility of the Tarim River Basin and the intensity of human disturbances, four ESs closely related to regional ecological security were selected: HQ, SC, WS, and WY. These services collectively represent ecosystem structural integrity, soil and water regulation, wind erosion control, and water supply capacity, and are widely used as core indicators of ecosystem stability in arid regions.
ESs were assessed using established spatial modeling frameworks selected specifically for their suitability in arid environments. The WY, SC, and HQ services were quantified utilizing the InVEST model [
25,
26]. This framework was chosen because it effectively captures spatial heterogeneities in ecosystem functions with relatively low empirical data requirements, making it highly applicable to the vast and data-scarce Tarim River Basin. Meanwhile, WS was estimated using the Revised Wind Erosion Equation model [
27]. This specific equation system is widely recognized for its robust performance in arid and semi-arid regions where wind erosion constitutes a dominant ecological threat. All underlying formulas were constructed relying on localized regional climate, topography, and land-use parameters to ensure accurate representation [
28,
29]. (It should be noted that while both SC and WS contribute to regional soil stability, they represent distinct physical mitigation processes. SC, quantified via the InVEST Sediment Delivery Ratio module, specifically estimates the ecosystem’s capacity to reduce hydric erosion by intercepting raindrops and slowing surface runoff. In contrast, WS, estimated using the RWEQ model, characterizes the capacity to mitigate aeolian erosion by increasing surface roughness and reducing near-surface wind velocity. Distinguishing these two services is essential in the Tarim River Basin, where water and wind act as independent yet complementary drivers of land degradation.) To ensure full methodological reproducibility, all specific localized parameter values are systematically detailed in
Appendix A (
Table A5,
Table A6 and
Table A7).
The WY module calculates the relative water contribution of each land parcel based on the Budyko curve and annual average precipitation. The specific equations are constructed as follows:
where Y
x,j is the annual WY of grid x on land use type j, mm/yr; AET
x,j is the annual actual evapotranspiration of grid x on land use type j, mm/yr; P
x is the annual precipitation of grid x, mm/yr; AWC
x is the available water content of vegetation; Z is the Zhang factor; wx is an empirical parameter; MSD is the maximum soil depth; RD
x is the root depth; PAWC indicates the plant available water content; SAN is the sand content of the soil, SIL is the silt content of the soil, CLA is the clay content of the soil, and OM is the organic matter content of the soil. Zhang coefficient (Z),which serves as an empirical parameter characterizing local hydroclimatic conditions in the InVEST WY module was set to 1.25 based on previous hydrological calibrations for the Tarim River Basin, reflecting its strongly water-limited conditions [
30]. Sensitivity tests indicated that varying Z within ±20% did not affect the spatial patterns or response directions of ESs, confirming model robustness.
SC is evaluated by estimating the difference between potential soil erosion and actual soil erosion using the universal soil loss equation framework. The structural equation is constructed as:
where SC is SC, t·hm
−2· yr
−1, R is the erosivity of precipitation, MJ·mm/(hm
−2·h·a), K is the erodibility of soil, t·hm
2/(hm
2·MJ·mm), LS is the terrain factor, C is the vegetation management factor, and P is the soil and water conservation measure factor.
The HQ module assesses the status of regional biodiversity by mathematically combining landscape sensitivity with the intensity of external anthropogenic threats. The formula is expressed as:
where Q
xy is the HQ level of grid x in land use type j; H
j is the habitat suitability of habitat type j; and z and k use the default parameters of the model.
The WS service is quantified by calculating the difference between potential wind erosion under bare soil conditions and actual wind erosion under current vegetation cover. The modeling sequence is constructed as follows:
where SR is the sand fixation rate, t/hm
2; SL
q is the potential soil wind erosion rate under bare soil conditions, t/hm
2; SL is the actual soil wind erosion rate under Vegetation cover conditions, t/hm
2; Q
x is the sand flux at x (kg/m); x is the plot length, Q
max is the maximum transfer amount, kg/m; s is the critical plot length (m); WF is the climate factor, which reflects the surface’s resistance to wind erosion; EF is the factor that represents the erodibility of the soil; SCF is the soil crust factor, K′ is the surface roughness factor; and COG is the vegetation coverage factor. The COG was calculated as an exponential decay function of the fractional vegetation cover (e
−0.0438c), where the vegetation coverage (c) was derived from NDVI using the standard pixel dichotomy model.
where WS
2 is the wind speed at 2 m, m/s, WS
t is the critical wind speed at 2 m; N is the number of observations; N
d is the number of test days; r is the air density, kg/m
3, g is the acceleration due to gravity, m/s
2; SW is the soil moisture factor; SD is the snow cover factor; S
a is the sand content of the soil, S
i is the silt content of the soil, C
i is the clay content of the soil, OM is the organic matter content, and CaCO
3 is the calcium carbonate content.
Fractional vegetation cover (Fc) data is calculated based on the theory of the pixel dichotomy model. In the formula, Fc is the vegetation coverage; NDVIveg is the NDVI value at the 95% percentile; and NDVIsoil the NDVI value at the 5% percentile.
2.4.2. Identification and Estimation of Causal Effects
To ensure the directional validity of driver selection and avoid the bias of interpreting correlation as causation, this study implemented an explicit causal inference workflow based on the potential-outcome framework and the DoWhy library [
31]. DoWhy is an open-source Python 3.9.10 framework that combines causal graphical models with potential outcomes to provide an end-to-end pipeline for causal effect modeling, identification, estimation, and robustness refutation. For each ES Y ∈ {HQ, SC, WS, WY} and each driving factor T [annual precipitation (Pre), normalized difference vegetation index (NDVI), population density (Pop), nighttime light index (NTL), built-up land ratio (BuiltRatio), gross domestic product (GDP), and evapotranspiration (Evap)], the causal estimand was defined as the average treatment effect (ATE), expressed as ATE = E[Y∣ do(T = t
1)] − E[Y∣ do(T = t
0)], where t
1 and t
0 represent two contrast levels of the standardized continuous driver. In this framework, the fundamental analytical unit is the spatial grid cell. It is important to note that for continuous standardized exposures, we did not define strict binary contrast levels (e.g., rigid t
0 and t
1 thresholds) for ATE estimation. Instead, the DoWhy framework was utilized to estimate linear average effects primarily as a directional screening tool. Consequently, this step functions as a causally informed feature selection process—filtering out highly confounded variables before the nonlinear ML modeling—rather than a strict econometric causal identification.
A Directed Acyclic Graph (DAG) was initially constructed using a hybrid approach that combines prior ecological knowledge and data-driven learning [
2,
32]. Preliminary causal assumptions were based on well-established ecological processes documented in previous research, such as climate affecting vegetation and vegetation influencing ESs, or socioeconomic pathways such as GDP influencing population, NTL and built-up ratio, which then affect ESs. The Peter-Clark (PC) algorithm was used to learn conditional independencies among variables, and edges inconsistent with ecological theory, such as vegetation causing precipitation, were removed or redirected. To ensure structural stability, the PC procedure was repeated across 200 bootstrap resamples, and only consistently appearing edges were retained. The resulting DAG served as the structural foundation for identifying confounding paths.
For each causal relationship between a driving factor (T) and an ecosystem service (Y), a valid adjustment set (Z) was identified using DoWhy’s backdoor criterion. This adjustment set includes confounding variables that block all non-causal paths from the driver to the service. We also excluded intermediate variables (mediators) on the causal pathway and avoided conditioning on colliders to prevent spurious associations. All final adjustment sets for each driver–service pair were reported in the
Appendix A to ensure transparency of the identification strategy [
33].
To ensure the directional validity of driver selection and avoid interpreting correlation as causation, this study implemented an explicit four-step causal inference workflow summarized in
Table 2. Following the initial causal modeling and DAG construction, the effect identification stage relied on fulfilling core assumptions such as consistency, conditional ignorability, positivity, and acyclicity. Once valid adjustment sets were identified using the backdoor criterion, three complementary estimators were applied to minimize model dependence during the effect estimation phase. Specifically, linear regression with covariate adjustment, Double Machine Learning [
34], and a Random Forest T-learner were utilized to calculate the average treatment effects for standardized variables. Finally, to confirm the reliability of these estimations, a rigorous robustness refutation phase was conducted [
31]. This included placebo treatment refutation, random common cause refutation, and subsampling-based refutation. Only those driving factors demonstrating consistent effect signs, stable magnitudes, and successful passage of all refutation tests were retained as robust, causally informed inputs for the subsequent interaction modeling.
Importantly, these causally informed feature selection results are not ‘black-box’ algorithmic outputs. The structural validity of the selected drivers is theoretically supported by the transparent DAG grounded in prior ecological knowledge, and empirically validated through the multiple falsification tests mentioned above. By relying on this dual validation structure, the driver selection moves beyond mere statistical correlation, providing a robust and traceable foundation for constructing interaction terms and identifying nonlinear threshold responses.
2.4.3. Construction and Identification of Natural-Socioeconomic Interaction Terms
The spatiotemporal dynamics of ESs are driven by the coupled effects of natural processes and human activities, particularly in arid regions where nonlinear and non-additive responses are common [
35]. To capture these coupled effects beyond single-factor influences, this study constructed natural–socioeconomic interaction terms based on the results of causal screening, providing a basis for analyzing nonlinear responses and threshold behaviors.
Interaction terms were constructed following two principles: (1) representing key couplings between natural and socioeconomic drivers, and (2) capturing potential amplification or suppression effects in ES responses. Based on causally informed feature selection results, precipitation (Pre) and vegetation index (NDVI) were selected as natural variables, while population density (Pop) and nighttime lights (NTL) were selected as proxies for human activity. Accordingly, interaction terms such as Pre × Pop and NDVI × Pop were constructed to explicitly represent the joint effects of environmental conditions and human pressure on ESs.
To explicitly test whether the effects of environmental drivers on ESs depend on the intensity of human activities, multiplicative interaction terms were introduced. In contrast to additive models, these terms allow the marginal effect of precipitation or vegetation to vary with population density, thereby capturing synergistic or amplifying effects that arise from coupled natural and socioeconomic processes. Ecologically, such interactions reflect situations in which human activities modify the sensitivity of ESs to climatic or vegetation conditions. From a social–environmental perspective, they represent nonlinear feedbacks between environmental resources and human pressures that cannot be adequately described by independent additive effects alone.
All variables were standardized prior to interaction construction, and variance inflation factor (VIF) tests were applied to ensure acceptable multicollinearity levels. These interaction terms were subsequently used as inputs for machine learning models to identify nonlinear response patterns and segmented threshold effects. Compared with single-factor models, this interaction-based approach provides a more realistic representation of coupled human–environment processes and enhances the explanatory power for ES response mechanisms under compound driving conditions. It is important to note that the constituent main effects (Pre, Pop, NDVI, NTL) were retained in the models alongside these explicit interaction terms. Because tree-based ensembles like XGBoost inherently approximate variable interactions through recursive splitting, the addition of explicit multiplicative terms typically yields modest incremental gains in overall predictive accuracy. Therefore, our primary objective in constructing these explicit interaction terms was not to maximize predictive performance, but to enable the SHAP explainer to calculate unified, trackable marginal contributions for the coupled socio-ecological pressure, thereby allowing for explicit threshold quantification.
After constructing natural–socioeconomic interaction terms, an explainable machine learning framework was established by integrating a gradient boosting decision tree model (XGBoost) with SHapley Additive exPlanations (SHAP). XGBoost was employed to capture the complex, high-dimensional, and nonlinear responses of ESs to interacting drivers without requiring predefined functional forms [
36].
The four ES indicators (SC, HQ, WS, and WY) were used as response variables, and model performance was evaluated using five-fold cross-validation to ensure predictive robustness and reduce overfitting. SHAP was applied to decompose model predictions and quantify the marginal contributions of individual drivers and interaction terms across samples [
37]. This approach enables transparent interpretation of nonlinear response patterns, including the direction, strength, and variability of effects under different factor couplings.
To reduce model uncertainty and enhance the robustness of the XGBoost-SHAP analysis, several complementary procedures were implemented. To account for the inherent spatial structure of ecological data and ensure robust model evaluation, model stability was assessed using five-fold stratified cross-validation, informed by best practices in spatial prediction and applicability [
38,
39,
40]. To avoid overfitting and artificial performance inflation, no individual underperforming folds were discarded. Instead, fold-to-fold variance was utilized as a penalty metric during model evaluation. Hyperparameter configurations or random seed iterations that exhibited high fold-to-fold variability (indicating instability) were entirely discarded. Only model configurations that demonstrated consistent and robust predictive performance across all five folds simultaneously were selected for the final analysis To account for potential randomness in model training and ensure the reproducibility of the results, each XGBoost-SHAP model was repeated 20 times using different random seeds. The stability of the nonlinear patterns was verified by evaluating the consistency of the SHAP dependence plots across these multiple iterations, ensuring that the functional trajectories and the locations of inflection points remained stable. Only interaction terms that maintained consistent SHAP importance rankings and response patterns across all repetitions were retained for threshold identification. Consequently, piecewise regression was performed only when these nonlinear patterns were proven to be stable across the ensemble of model runs. This verification procedure ensures that the identified thresholds represent robust ecological signals rather than artifacts of a single model execution.
2.4.4. Piecewise Linear Regression Model
After characterizing nonlinear ES responses using machine learning, a three-segment piecewise linear regression model was applied to translate complex response curves into explicit and interpretable threshold information. Specifically, while the SHAP dependence plots were utilized as a diagnostic tool to verify the presence and general shape of nonlinear phase shifts, the piecewise regression models were fitted directly using the actual ES values as the response variable and the natural–socioeconomic interaction terms as the explanatory variable. Ecosystem responses to coupled drivers often exhibit phase-like behavior, with accelerated or directional changes occurring once critical ranges are exceeded [
41]. Identifying such thresholds provides a basis for understanding ES sensitivity and for supporting differentiated management strategies.
The three-segment piecewise regression model divides nonlinear responses into three linear phases separated by two breakpoints, enabling the identification of inflection points, response direction changes, and stage-specific sensitivities. Breakpoint locations and regression parameters were estimated using least squares, thereby quantifying threshold positions and corresponding response patterns of ESs under varying intensities of interaction terms [
42]. The mathematical form can be expressed as follows:
where Y represents the value of the ES, X denotes the key interaction term, τ
1 and τ
2 are the breakpoint locations, β
0 is the intercept, β
1β
2β
3 are the regression slopes for each segment.
This approach enables the transformation of nonlinear response patterns into explicit and quantifiable thresholds, guided by robust interval estimation principles for segmented ecological regression [
43,
44]. Compared with curve-fitting-based analyses, piecewise regression not only characterizes stage-specific response behaviors but also identifies threshold conditions and sensitive intervals where ESs shift. Moreover, comparing thresholds across different natural–socioeconomic interaction terms provides quantitative evidence for distinguishing dominant mechanisms under multiple pressures, thereby supporting risk warning, ecosystem regulation, and land management decisions.
3. Results
3.1. Spatio-Temporal Variations in ES Functions
Figure 3 presents the spatial distribution of HQ, SC, WS, and WY in the Tarim River Basin for the year 2023. While the overall spatial patterns of these ESs exhibit significant heterogeneity, their temporal variations between 2000 and 2023 are relatively limited as summarized in
Table 3.
The spatial pattern of HQ (
Figure 3a) consistently follows a distribution of being high in mountains and low in deserts. High-value areas are mainly concentrated in mountainous regions and piedmont zones due to the influence of vegetation cover and topographic conditions. Conversely, low-value areas are stably located in the interior of the Taklamakan Desert and within oasis built-up zones. Although the overall pattern showed little change over the study period,
Table 3 indicates a slight declining trend in mean HQ values. This is particularly evident in some marginal areas of the oases where cropland expansion and intensified human disturbances have occurred.
The overall level of SC (
Figure 3b) is relatively low with minimal spatial differentiation across the basin. Large low-value areas cover the majority of the Tarim Basin, while slightly higher values are restricted to local mountainous areas and oasis edges. Numerical results in
Table 3 show that this service has remained remarkably stable over time with only minor fluctuations in specific localities.
WS (
Figure 3c) displays a spatial distribution where elevated values occur along desert margins, whereas reduced values are found in oasis and mountainous regions. High-value regions are concentrated in the desert–oasis transition zones along the northern and southern margins of the Taklamakan Desert. Between 2000 and 2023, the overall pattern remained stable, yet
Table 3 reveals a local increase in WS function. This positive trend is likely related to the implementation of shelterbelt plantations and targeted ecological restoration projects along certain desert edges.
WY (
Figure 3d) consistently exhibits a spatial pattern of high values in mountainous areas and low values in desert regions. High-value regions are concentrated in the Tianshan and Kunlun Mountains because they are primarily controlled by precipitation and snowmelt. The desert areas and oasis interiors remain largely low-value zones. While the overall spatial structure remains stable,
Table 3 shows that the extent of high-value areas in some mountainous regions fluctuates slightly, which is likely driven by climate-induced changes in precipitation and glacier melt processes.
In summary, the four types of ESs exhibit high spatial heterogeneity while their temporal changes are generally limited. HQ and WY show pronounced contrasts between mountainous and desert areas, WS is most prominent along the desert margins, and SC remains generally low across the basin. The edges of oases and desert transition zones represent the key areas where these services are most sensitive to environmental and anthropogenic changes.
3.2. Results of Causal Effect Identification and Estimation
The causal relationships derived through the PC algorithm combined with the DoWhy framework (
Figure 4) elucidate how natural and socioeconomic variables exert multi-pathway influences on ESs. Across the four types of ESs, two dominant pathways can be identified. The first is the climate–vegetation pathway, which originates from Pre and Evap and indirectly influences ESs through NDVI; the second is the population–land use pathway, which starts from GDP, reflects the intensity of social activities through Pop and NTL, and further influences ecosystem processes via BuiltRatio. Specifically, for HQ, natural processes primarily operate through the indirect chain “Pre/Evap → NDVI → HQ,” whereas socioeconomic factors exert their influence through the sequential pathway “GDP → Pop/NTL → BuiltRatio → HQ,” reflecting the cumulative effects of urbanization and land development. For SC, in addition to the indirect effects of climate and vegetation, there is also a direct pathway “Pre/Evap → SC,” indicating that hydrothermal conditions exert a direct regulatory effect on erosion processes. Similarly, WS and WY exhibit a comparable dual-pathway structure driven by climate. Climate variables influence ecological processes both indirectly through NDVI and directly by regulating hydrological supply, while socioeconomic effects are mainly transmitted along the pathway “GDP → Pop → BuiltRatio → WS/WY,” reflecting the strong shaping effect of land use structure on hydrological functions.
3.3. Construction of Socio–Ecological Interaction Terms for ES Drivers
After completing the identification and estimation of causal effects, this study performed significance testing and robustness evaluation for the relationships between the four ESs (HQ, SC, WS, WY) and their driving factors. Using a dual standard of p-values (p < 0.05) and placebo tests for random confounders, factors that were statistically significant and directionally stable across scenarios were selected, and a comprehensive analysis was conducted by considering effect sizes alongside the causal directed acyclic graph.
For the driving factors of HQ (
Figure 5a), NTL exhibits the most pronounced negative effect (−0.659), making it the largest-magnitude factor across all scenarios. In contrast, NDVI (+0.102) and Evap (+0.029) contribute positively to HQ, while GDP (−0.043) shows a negative effect. For the driving factors of SC (
Figure 5b), Pop (+0.139) and Pre (+0.029) are the most important positive contributors. NDVI (+0.006) and BuiltRatio (+0.010) also exhibit positive effects, whereas NTL (−0.054) and GDP (−0.012) show negative influences. For the driving factors of WS (
Figure 5c), Pre (+0.137), NDVI (+0.049), NTL (+0.147), and BuiltRatio (+0.039) are all significant positive contributors, whereas Pop (−0.097) and Evap (−0.044) exert notable negative effects on WS. For the driving factors of WY (
Figure 5d), Pre (+0.058), NDVI (+0.013), Pop (+0.110), and BuiltRatio (+0.020) all exhibit positive effects, whereas NTL (−0.072), GDP (−0.017), and Evap (−0.012) show negative effects. Overall, among the natural factors, Pre and NDVI exhibit consistently positive effects across most ESs, serving as the fundamental environmental drivers that sustain ecological functions. Regarding socioeconomic factors, Pop and NTL are significant across multiple scenarios but with varying directions: Pop exerts positive effects on SC and WY but negative effects on WS, whereas NTL shows strong negative effects on HQ and WY but positive effects on WS, highlighting the differentiated impacts of human activity intensity on ESs. GDP exhibits generally weak and predominantly negative effects, while BuiltRatio shows minor positive effects in certain scenarios, potentially reflecting localized compensatory effects of engineered land use on ecological functions.
Based on the above causally informed feature selection results, the explanatory strength of each candidate driver was evaluated using the mean absolute values of the estimated ATEs across different ecosystem services, which provided a consistent measure of their cumulative explanatory contributions. Drivers showing consistently stronger cumulative effects were considered as primary candidates. Among these variables, Pre, NDVI, Pop, and NTL exhibited relatively higher explanatory contributions compared with other factors, and were therefore selected as the key drivers. In addition to their explanatory strength, the selection of four variables was also guided by the requirements of subsequent interaction modeling. Since this study aimed to construct pairwise interaction terms while maintaining model interpretability and reducing severe multicollinearity, selecting four dominant variables enabled the formation of balanced interaction structures between natural and socioeconomic drivers. Specifically, Pre × Pop and Pre × NTL were used to characterize the coupling effects between hydrological conditions and human activities, while NDVI × Pop and NDVI × NTL were employed to reveal the marginal effects of interactions between vegetation conditions and development intensity. Considering the high collinearity between Pop and NTL, all variables were standardized before proceeding to machine learning modeling and threshold analysis. Two interaction models were then constructed: Group A (with Pop as the core variable) and Group B (with NTL as the core variable), in order to avoid severe multicollinearity and enhance the comparability between the two types of variables.
3.4. Driver Importance Analysis Based on XGBoost and SHAP
In this study, XGBoost-SHAP models were constructed separately for Group A and Group B, and their performance was compared using five-fold cross-validation. The results showed that Group A achieved lower RMSE and higher R
2 values for WS, WY, and SC, whereas Group B exhibited a slight advantage only for HQ (
Table 4). This suggests that the nighttime light variable is more sensitive in capturing the relationship between human disturbance and HQ. Given the superior overall performance of Group A, the subsequent XGBoost-SHAP importance interpretation and piecewise regression threshold identification were carried out based on the Group A model, ensuring that the results are more robust and representative. Furthermore, across the 20 repeated model runs with different random seeds, the variability in model performance was minimal (standard deviations of R
2 were consistently below 0.05), confirming the stability of the model evaluations and the extracted SHAP dependence patterns. In addition, the comparative results of Group B are provided in the
Appendix A, where the overall trends of the two models remain consistent, further confirming the robustness of the conclusions in this study.
In this study, the terms “marginal contribution” and “marginal effect” refer to the way in which a change in a driving factor alters the predicted value of an ES within the XGBoost-SHAP framework. Specifically, the marginal contribution of a variable represents the extent to which its SHAP value contributes positively or negatively to the model output for a given grid cell, compared with the average prediction across all samples. The marginal effect describes how this contribution changes as the value of the variable varies along its gradient. In the context of ESs, these concepts therefore capture how sensitive an ES is to incremental changes in a natural or socioeconomic driver, and whether such changes amplify, suppress, or reverse the service level at different ranges of the driver. This interpretation allows SHAP-based nonlinear patterns to be directly linked to ecological response mechanisms.
In the SHAP analysis of Group A (
Figure 6), Pre × Pop emerged as the most important interaction feature across all ES models, with the highest mean absolute SHAP value observed for HQ, followed by SC, WS, and WY. In contrast, NDVI × Pop exhibited mean absolute SHAP values close to zero, indicating a weak overall marginal contribution and only limited positive effects in a few samples. In HQ and SC, the SHAP distribution of Pre × Pop is wider, with lower interaction values mostly corresponding to slight negative effects and higher values mostly corresponding to positive contributions, indicating that under population pressure, precipitation conditions have marginal effects on HQ and SC that can be both positive and negative. The distribution of WS exhibits a typical “directional reversal”: low and high interaction values mostly contribute negatively, whereas mid-range values contribute positively, indicating a highly nonlinear response of WS to the interaction between hydrology and population. In WY, the distribution of Pre × Pop is concentrated and overall positive, indicating that while its marginal contribution to WY is limited, the overall promoting effect is relatively consistent, with the direction of its marginal contribution stable but its absolute magnitude smaller than in HQ. Across the four ESs, the SHAP values of NDVI × Pop are largely concentrated around zero, indicating a weak marginal contribution and only limited positive effects in a few samples. In contrast, Pre × Pop exhibits a significantly higher mean absolute SHAP value with a wider distribution, demonstrating a stronger driving effect in explaining variations in ES functions.
Through SHAP dependence plots (
Figure 7), this study further explored the nonlinear relationships between Pre × Pop and NDVI × Pop and the target variables. Overall, the responses of different ESs to the natural–socioeconomic interaction terms exhibited pronounced nonlinear characteristics, with trend shifts occurring within specific threshold ranges, indicating that the synergistic effects of natural and socioeconomic factors exert threshold-regulated influences on ESs. The impact of Pre × Pop on ESs shows clear stage-specific turning (
Figure 7a,c,e,g). When the Pre × Pop values are relatively low (approximately <0.1), the SHAP values of ESs are all negative, indicating that under conditions of low precipitation and sparse population, the interaction exerts a suppressive effect on ESs. As the interaction intensity increases, ES functions rise significantly, with SHAP values rapidly shifting from negative to positive, indicating that the combined effect of improved precipitation and population aggregation favors the optimization of ecosystem structure and the enhancement of service supply capacity. When Pre × Pop exceeds approximately 0.4, the responses of WS and WY tend to stabilize or even slightly decline, suggesting that the combined effects of increased precipitation and population growth may approach saturation, with marginal responses of ESs diminishing. It is worth noting that distinct inflection points can be observed within certain ranges of the curves, which is consistent with the theoretical assumption of threshold effects, indicating that the sensitivity of ESs to interactive drivers increases significantly near these thresholds. In contrast, the response curves of NDVI × Pop exhibit smaller fluctuations (
Figure 7b,d,f,h) and show an overall smoother trend. When NDVI × Pop is low, the SHAP values of ESs are close to zero or slightly negative, indicating that in areas with strong human disturbances and low vegetation recovery, the regulatory effect of the NDVI × Pop interaction on ESs is limited. As NDVI × Pop increases to approximately 0.2–0.5, the SHAP values of all services gradually rise and maintain a positive effect, indicating that the synergistic interaction between improved vegetation cover and moderate human activities can sustainably enhance ecosystem functioning. When NDVI×Pop exceeds approximately 0.5, both HQ and WY tend to level off or show a slight decline, suggesting a weakened marginal response of ESs. However, the sensitivity varies among different services: HQ and WS exhibit greater response amplitudes, indicating a stronger dependence of these ESs on the “NDVI–Pop” synergy; whereas WY and SC show relatively moderate responses, suggesting lower sensitivity to this interaction term.
Overall, the SHAP results verified the nonlinear responses of ESs to the coupled natural–social interactions and identified key threshold ranges where ecosystem functions shift from inhibition to enhancement, offering a solid basis for the precise detection of threshold effects.
3.5. Threshold Effects of Interaction Terms on ES Functions
Threshold analysis focused on the interaction terms Pre × Pop and NDVI × Pop, which were consistently identified as the most influential coupled drivers through causal screening and XGBoost-SHAP interpretation. Pre × Pop exhibited the highest SHAP contributions across all ESs, indicating a substantial nonlinear influence, while NDVI × Pop showed weaker but non-negligible effects. Other interaction terms displayed negligible contributions or lacked stable nonlinear patterns and were therefore excluded from threshold identification.
Piecewise linear regression was applied to quantify threshold positions and segmented response patterns of ESs to these key interaction terms (
Figure 8). Across all services, pronounced nonlinear responses were observed, with clear stage-wise shifts along the interaction gradients. It should be noted that the back-transformed thresholds of the interaction terms (e.g., Pre × Pop) are expressed in composite units (e.g., mm·persons/km
2). Because these composite units lack a direct, single physical counterpart, they should be interpreted ecologically as a relative “coupled socio-ecological pressure index.” A threshold value on this composite axis indicates a critical boundary where the synergistic intensity of natural and human activities reaches a tipping point, fundamentally altering the trajectory of the ecosystem service, rather than literal ecological limits. In this study, they indicate points along the Pre × Pop or NDVI × Pop gradients where ES responses change more abruptly, allowing the identification of zones with heightened ecological sensitivity.
For Pre × Pop, HQ, SC, and WY generally exhibit a three-stage “increase–decrease–recovery” pattern, whereas WS follows a contrasting “decrease–increase–decrease” pattern. These results indicate that moderate population–hydrology coupling can enhance certain services, while intensified coupling may lead to degradation unless offset by strong human regulation or management interventions.
For NDVI × Pop, overall response magnitudes are lower than those of Pre × Pop, but distinct threshold behaviors remain evident. HQ and WY show moderate recovery trends at higher coupling levels, suggesting that increased vegetation cover can partially compensate for population pressure, whereas SC and WS display more constrained or transitional responses, reflecting service-specific sensitivities to vegetation–human interactions.
The two breakpoints identified by piecewise regression (τ1 and τ2) represent statistically optimized transition points where ES response slopes change significantly. Across services, τ1 generally marks the onset of nonlinear sensitivity under coupled pressures, while τ2 indicates a secondary transition associated with intensified human intervention or regulation. Notably, these breakpoints closely align with inflection points observed in SHAP dependence plots, demonstrating strong consistency between machine learning–derived nonlinear patterns and regression-based threshold identification.
Overall, the results reveal clear heterogeneity in threshold positions and response directions among ESs, highlighting that threshold effects are jointly shaped by natural and socioeconomic interactions rather than by single drivers. Detailed numerical threshold values, along with their 95% confidence intervals derived from bootstrap resampling to ensure statistical robustness, are provided in
Appendix A.
4. Discussion
4.1. Ecological Interpretation of Driving Mechanisms and Threshold Effects
The interaction between hydrological conditions and population (Pre × Pop) emerges as the primary driver of nonlinear ES responses in the Tarim River Basin, reflecting the combined effects of resource supply, utilization pressure, and human regulation. Hydrological conditions generally promote HQ and WY by enhancing vegetation productivity and soil moisture, but these effects are strongly mediated by population pressure. As population density increases, water resources are progressively reallocated to irrigation, engineering diversion, and urban expansion, reducing ecological water availability and leading to declines in HQ and SC. We hypothesize that this process reflects a potential “crowding-out effect,” in which human water demand displaces water required for ecosystem maintenance. At higher population–hydrology coupling levels, some ESs exhibit partial recovery. Although not explicitly tested as an isolated variable in our model, this recovery aligns with the widespread implementation of intensified management and engineering interventions in the basin (e.g., irrigation optimization and shelterbelt construction). This potential “engineering substitution” relies on sustained water inputs and long-term maintenance, implying underlying sustainability risks.
Overall, ES threshold responses in the Tarim River Basin follow a sequence of resource supply, crowding-out, and engineering substitution, reflecting stage-dependent dominance of natural supply, utilization pressure, and artificial regulation.
These identified nonlinear responses represent fundamental shifts in ES provisioning and can be mathematically defined by the three-segment piecewise linear model used in this study. This mathematical structure reveals that the crowding out effect is not a constant process but a threshold-driven mechanism. When anthropogenic water demand remains below the first tipping point, the natural ecosystem maintains functional stability through its inherent buffering capacity. However, once this critical limit is exceeded, the competitive advantage of agricultural expansion leads to a sudden and disproportionate displacement of ecological water, triggering a rapid decline in services such as habitat quality.
By quantifying these specific inflection points, this study moves beyond qualitative descriptions of human-environment interactions. These empirical thresholds provide a rigorous mathematical basis for defining ecological safety boundaries in regional planning. For instance, the transition zones identified between stable and declining phases can serve as early-warning indicators for sustainable oasis management. Future studies can utilize these specific numerical intervals to parameterize predictive models or to establish spatial redlines for land development, ensuring that regional water-allocation policies are grounded in precise, evidence-based thresholds rather than general conservation goals.
4.2. Comparison and Complement to Existing Studies
Previous studies have recognized the sensitivity of arid-region ESs to hydrological conditions and reported nonlinear responses [
45,
46,
47]. However, most analyses rely on correlation-based approaches, providing limited causal interpretation and service-specific threshold identification.
This study advances existing research methodologically by introducing causal inference to screen drivers and integrating XGBoost-SHAP with piecewise regression to interpret nonlinear responses and quantify thresholds. Empirically, the results confirm the importance of hydrology–population interactions while revealing distinct, service-specific threshold patterns, with HQ, SC, and WY showing “positive–negative–positive” responses and WS exhibiting a contrasting pattern. Recent studies have similarly moved beyond single-factor thresholds toward integrated nonlinear frameworks [
48], supporting the relevance of threshold-based ES analysis. While consistent with emerging approaches, this study explicitly incorporates socioeconomic pressures and links causal screening with interpretable machine learning to derive management-relevant thresholds in water-limited arid basins.
4.3. Management and Policy Implications
The identified threshold effects provide direct guidance for zoned ecosystem management in arid regions. For instance, thresholds of HQ, SC, and WY occur mainly under low to moderate Pre × Pop levels, indicating critical tipping points where population pressure can rapidly offset hydrological improvements. When these thresholds are approached, interventions such as irrigation optimization, crop adjustment, and ecological compensation can mitigate crowding-out effects. Furthermore, the “negative–positive–negative” pattern of WS underscores the importance of the medium-coupling stage, where moderate hydrology and limited human management maximize sand-fixation capacity, suggesting a focus on large-scale shelterbelt construction. Conversely, in high-coupling regions where overexploitation diminishes the effectiveness of engineering solutions, strategies should shift toward ecological restoration to reduce human pressure.
However, while these threshold intervals provide a quantitative scientific baseline for regional planning, these empirical targets alone are insufficient for dictating comprehensive policy. Practical management in the Tarim River Basin must integrate a broader spectrum of socioeconomic realities. The transition from water-intensive crops to sustainable agricultural practices requires careful consideration of local farming traditions, economic preferences, and community customs. Furthermore, the actual effectiveness of ecological redlines depends heavily on institutional compliance, stakeholder engagement, and the adaptability of local governance structures. Therefore, the empirical thresholds derived here should not be viewed as rigid rules but rather as scientific reference frameworks. Achieving genuine long-term sustainability requires coupling these evidence-based ecological boundaries with culturally sensitive, economically viable, and practically enforceable management strategies.
Ultimately, although the specific numerical thresholds are context-dependent, the analytical framework developed here is applicable beyond the Tarim River Basin. It provides valuable insights for managing other ecologically fragile, water-limited regions globally under increasing climate and human pressures.
4.4. Methodological Contributions and Applicability
This study contributes methodologically by integrating causal inference into driver selection and linking interpretable machine learning (XGBoost-SHAP) with piecewise regression for explicit threshold quantification. This integrated workflow ensures directional validity, enhances interpretability, and provides management-relevant threshold information under complex natural–socioeconomic coupling.
In terms of applicability, the analytical framework developed in this study has the potential to be adapted to other arid or ecologically fragile regions. While further empirical testing is needed to fully assess its generalizability, the structure of the approach allows it to be applied in settings where ESs are shaped by multiple interacting pressures. In coupled systems of Central Asian arid regions or African drylands, for example, the framework may help reveal threshold patterns of ES responses under varying socio-economic conditions. On the other hand, the framework can be extended to studies of different types of ESs. Beyond the HQ, SC, WS, and WY considered here, it can be applied to services such as carbon storage, water purification, and food provision, making it particularly suitable for exploring the dynamics of synergy and trade-offs among multiple services near their thresholds.
In addition, this study compared the interactive modeling results of Group A (population density–centered) and Group B (night–time light–centered). While both groups exhibited consistent overall trends, Group A demonstrated higher explanatory power and better captured the direct pressures of human activities on ecosystems. This finding indicates that different proxies of human activity may vary in their ability to explain ESs, highlighting the importance of selecting appropriate indicators based on regional characteristics. It also demonstrates that the methodological framework employed here not only identifies key interactions and threshold effects but also allows for robustness comparisons across different proxies, thereby enhancing the reliability and interpretive depth of the results.
Overall, the methodological contribution of this study lies in establishing an integrated framework that combines causally informed feature selection, model interpretability, and threshold quantification. This framework not only reveals the complex driving mechanisms of ESs but also provides precise, quantitative triggers for policy formulation, thereby offering substantial potential for both academic dissemination and practical application.
4.5. Limitations and Prospective Research Pathways
Although multiple complementary methods and robustness checks were applied to strengthen causal inference, some uncertainty remains. Causal structure learning depends on the completeness and accuracy of observed variables, and omitted factors or measurement errors may bias the inferred causal graph. Because some of our adjustment sets were partly determined by data-driven variable correlations rather than strictly relying on predefined DAG backdoor paths, we explicitly define our approach as ‘causally informed screening’ rather than claiming absolute causal validity. While this represents a limitation from a pure causal inference perspective, potentially leaving residual confounding in some cases, it is a pragmatic and robust strategy for identifying meaningful ecological drivers within complex, high-dimensional socio-ecological systems.
A further limitation arises from the predictive modeling approach. The XGBoost-SHAP framework provided reasonable predictive accuracy for most ESs, but a few services exhibited relatively modest R2 values. This pattern does not reflect deficiencies in the services themselves, but rather the complexity of the ecological processes that govern them. Furthermore, this predictive limitation is partially an intentional trade-off inherent to our rigorous causal framework. By strictly limiting the number of explanatory variables to only those validated as true causal agents, the maximum achievable R2 is naturally constrained. While testing the model with all original indicators would likely inflate predictive performance metrics through spurious correlations, doing so would violate the causal assumptions necessary for identifying reliable, management-relevant ecological thresholds. Many of these processes operate at spatial and temporal scales that exceed the resolution of the available environmental predictors. As a result, key interactions, nonlinear responses, and ecological thresholds may not be fully represented in the model, leading to reduced explanatory power. Additionally, uncertainties related to remotely sensed variables and gridded hydrological datasets may affect model performance. These issues underscore the need for higher-resolution environmental data and modeling approaches capable of capturing more complex ecological interactions in future research.
A notable methodological limitation relates to the handling of spatial autocorrelation in our raster-derived data. The current framework utilized standard five-fold stratified cross-validation to evaluate the XGBoost models. As highlighted by recent critical advancements in geospatial modeling [
39,
40], standard random cross-validation can lead to optimistic performance estimates (inflated R
2 values) when applied to strongly spatially structured data, as the training and testing sets may lack spatial independence. In this study, our primary objective was explanatory–utilizing SHAP to interpret the overarching socio-ecological relationships and thresholds within the observed spatial boundaries of the Tarim River Basin–rather than predictive extrapolation to unobserved regions. Consequently, spatial block cross-validation and Area of Applicability assessments were not implemented. Nevertheless, we fully acknowledge that future studies aiming to generalize these ecological thresholds or map predictions to out-of-region landscapes must adopt spatial or buffered cross-validation strategies to strictly account for spatial dependence and ensure predictive transferability.
Finally, the testing of causal effects was based primarily on conventional statistical procedures. In borderline cases, conservative decisions based on preset significance thresholds were necessary, which may influence the robustness of some conclusions. Despite these limitations, the integrated framework developed in this study provides a coherent and transparent basis for understanding the relationships between drivers and multiple ESs. Future research can be further expanded in three directions. First, integrating a more comprehensive set of variables and higher-quality observational data could reduce the risk of omissions and biases in causal graph construction. Second, the adoption of rigorously designed causal forests or Bayesian-based causal estimation methods would allow better characterization of effect heterogeneity and uncertainty. Third, exploring the integration of multiple significance assessment approaches with robustness checks can provide a firmer statistical foundation for causal effects. These improvements are expected not only to enhance the reliability of causal inference but also to offer a higher-level methodological basis for studying threshold effects in ESs.