1. Introduction
Water Resource Carrying Capacity (WRCC) denotes the maximum support capacity and safety threshold of a regional water resources system for the coordinated development of population, economic activities, and the ecological environment under specified conditions of water resource endowment, development and utilization intensity, and ecological constraints [
1]. WRCC is not solely a water quantity constraint; rather, it represents a coupled equilibrium between water supply processes, the growth of socioeconomic water demand, and ecosystem water requirements. In this sense, WRCC reflects the dynamic coordination of three interacting subsystems, namely water resources, socioeconomy, and the ecological environment [
2,
3,
4,
5]. Under global climate change, together with strong disturbances induced by water withdrawals, regulation, and land use transitions, water scarcity and systemic vulnerability in arid and semi-arid regions have intensified, making water constraints a central bottleneck for sustainable development [
6,
7]. Accordingly, quantifying the spatiotemporal evolution of WRCC and developing predictions that can be verified and directly serve management needs are essential for improving allocation efficiency, safeguarding ecological security, and supporting adaptive regulation.
WRCC assessment commonly adopts an integrated framework that includes water resources, society, economy, and ecology as core subsystems [
1,
8,
9]. Conventional approaches include comprehensive evaluation models, system dynamics, ecological footprint methods, and statistical techniques [
9,
10,
11,
12,
13,
14,
15]. These approaches offer clear strengths in indicator system design, weight determination, and cross-sectional comparison [
16]. However, they often have difficulty achieving both interpretability and predictive capability when WRCC exhibits non-linear responses driven by multi-source information, cross-scale coupling, and strong spatial heterogeneity. In particular, persistent limitations remain in representing interactions among drivers, identifying critical thresholds, and capturing high-dimensional spatiotemporal dynamics [
10,
17]. Given the growing combined influence of climate change and human activities, more effective technical approaches are needed to enhance the resolution and forecasting skill of WRCC assessments.
In recent years, artificial intelligence methods, especially machine learning, have shown strong performance in high-dimensional modeling, non-linear relationship learning, and spatiotemporal prediction. Because these methods can accommodate complex hydroclimatic and water use data, improve predictive accuracy, and reduce uncertainty [
18,
19,
20,
21], they provide a promising methodological toolkit for WRCC research [
22,
23,
24]. Previous studies demonstrate that machine learning can improve data quality control, support feature selection, and quantify variable contributions, thereby facilitating the refinement of indicator systems and strengthening the representation of coupled human–water environment systems [
21,
25,
26,
27,
28,
29]. In addition, algorithms such as Random Forest, Gradient Boosting, and Neural Networks can extract stable statistical signals from spatiotemporal variability, which improves the dynamic simulation and predictive performance of WRCC assessment frameworks [
16,
24,
30]. Deep learning is particularly effective in learning temporal dependencies, whereas ensemble learning can better accommodate spatial heterogeneity, together offering a feasible pathway for refined spatiotemporal WRCC characterization [
31,
32]. Nevertheless, applications for resource and environmental management require more than predictive skill alone. Model outputs must also be interpretable, reproducible, and actionable to support decision-making. Meeting these requirements remains a key challenge for future machine learning-based WRCC studies.
Gansu Province is located in the arid inland region of northwest China and is characterized by limited natural water availability, pronounced spatiotemporal variability in water resources, and per capita water availability far below the national average. As a result, the province has long faced a severe supply–demand imbalance, and its WRCC often remains near critical conditions or within overloaded states [
6,
7,
11,
33]. Previous studies have assessed WRCC in Gansu using approaches such as Principal Component Analysis, GIS-based spatial analysis, and ecological footprint methods, revealing persistent spatial mismatches between water availability and water use and highlighting potential overloading risks in water-stressed areas [
12,
34,
35].
However, two critical research gaps remain unaddressed in these existing regional evaluations. First, traditional assessment frameworks treat WRCC driving factors as independent or purely linear combinations, thereby failing to capture the complex, non-linear interactions and potential regulatory thresholds driven by coupled human–water environments under accelerating climate change. Second, most existing works rely on static snapshots or historical phase-based evaluations, lacking an integrated near-term forecasting mechanism that provides independently verifiable trend signals to support adaptive water management.
To bridge these gaps, this study couples the objective information-mining of the Entropy Weight Method (EWM), the high-dimensional feature sorting of Random Forest (RF), and the historical dependence of the ARIMA model. This hybrid framework is highly necessary because it establishes a rigorous analytical closed-loop: the EWM resolves informational redundancy in multi-source data, RF breaks through linear assumptions to objectively quantify dominant non-linear drivers without subjective bias, and ARIMA utilizes these stabilized historical structures to project near-term dynamics. Such a methodological combination provides a level of generalization performance and process-based interpretability that standalone conventional models cannot achieve.
Against this backdrop, this study uses Gansu Province as a representative arid and water-limited region to integrate the classical WRCC evaluation framework with data-driven machine learning. To guide our diagnostic framework, this manuscript aims to directly answer three core research questions: (1) How has the continuous temporal carrying pressure evolved across the 14 prefecture-level units over the past two decades? (2) What are the dominant natural and anthropogenic drivers governing these variations, and how do they rank in non-linear contribution? (3) What are the verifiable near-term trajectories and potential overloading risk zones across the province over the next five years?
To address these questions, the study pursues three explicit objectives. First, it constructs an indicator system and quantifies a comprehensive carrying capacity index using multi-source data from 2000 to 2023 to describe the continuous spatiotemporal patterns and evolution of WRCC in Gansu. Second, it applies Random Forest to identify dominant drivers of WRCC change and to quantify their relative contributions. Third, it develops an ARIMA predictive framework to forecast WRCC trends from 2024 to 2028 and to screen areas with elevated risk of overloading. The results are expected to provide a practical quantitative basis for refined water resources regulation, demand management, and risk mitigation in Gansu, and to offer a reproducible empirical reference for applying machine learning to the assessment of coupled resource environment systems. The technical workflow is shown in
Figure 1.
2. Study Area Overview and Data Sources
2.1. Study Area Overview
Gansu Province (32°31′ to 42°57′ N, 92°13′ to 108°46′ E) is located in inland northwest China and functions as an important water conservation region and an ecological security barrier for the Yellow River and Yangtze River basins. The province covers 425,800 square kilometers and exhibits complex geomorphic conditions with a clear southeast-to-northwest zonality. Major physiographic units include the Longnan mountainous region, the Longzhong Loess Plateau, the Gannan Plateau, the Qilian mountainous region, the Hexi Corridor, and the area north of the Hexi Corridor. The climate is dominated by a temperate continental regime with alpine and plateau influences. Mean annual precipitation ranges from 36 to 600 mm and shows strong spatial heterogeneity, generally decreasing from the southeast to the northwest. Precipitation is also characterized by pronounced interannual variability, while atmospheric evaporative demand is high, together contributing to strong hydroclimatic aridity and water stress [
1].
Gansu is marked by very limited water resource endowment and a pronounced spatial mismatch between water availability and the distribution of socioeconomic activities. The province has total annual water resources of about 289.4 × 10
8 cubic meters, and per capita water availability remains well below the national average, making Gansu a representative water-limited region. Water resources are unevenly distributed and are mainly concentrated in the Yellow River system, including tributary basins such as the Taohe and Weihe Rivers, as well as in the Bailongjiang River basin within the Yangtze River catchment. In contrast, the extensive Hexi inland river basins rely heavily on meltwater and precipitation originating from the Qilian Mountains, which amplifies the imbalance between water supply and water demand. Under the combined pressures of climate change and intensive human activities, the vulnerability of the regional water system has increased, and water-related risks have become more prominent, including reduced water availability, stronger temporal variability, intensified competition among sectors, and associated eco-environmental degradation. Therefore, rigorous assessment and management-oriented prediction of WRCC dynamics in Gansu are essential for strengthening water security, maintaining ecological integrity, and supporting sustainable socioeconomic development. The study area is shown in
Figure 2.
2.2. Data Sources
The datasets used in this study were obtained from three sources. First, hydrological and water use data, including surface water resources, precipitation, and sectoral water consumption, were collected from the Gansu Provincial Water Resources Bulletin for 2000 to 2023, issued by the Gansu Provincial Department of Water Resources. Second, socioeconomic data, including GDP, population, and industrial structure, were compiled from the Gansu Development Yearbook for the same period. The yearbooks are prepared by the Gansu Provincial Bureau of Statistics and can be accessed through the China National Knowledge Infrastructure database. Third, derived indicators, including the water supply modulus and water consumption per 10,000 yuan of GDP, were calculated by the authors using the above primary datasets.
3. Research Methods
3.1. Entropy Weight Method (EWM)
The determination of index weights is directly related to the accuracy and objectivity of the evaluation results. While subjective methods, such as the expert investigation method [
35], the Analytic Hierarchy Process (AHP) [
36], and other subjective weighting approaches [
37], are simple to apply and easy to interpret, they tend to ignore the inherent characteristics of the data itself and lack objectivity [
38]. In contrast, the Entropy Weight Method (EWM) is a multi-attribute decision-making analysis method based on information entropy theory. This method effectively addresses the problem of information redundancy among indicators and significantly reduces subjective bias in the weight determination process through an objective weighting mechanism [
39]. Consequently, this study adopts the EWM to calculate the weights of the evaluation indicators. The calculation formulae are as follows:
(1) Standardization of indicator data. The calculation formula is as follows:
where
i is the evaluation indicator index,
i = 1, 2, …,
n;
j is the evaluation year,
j = 1, 2, …,
m;
is the original indicator value;
is the standardized value; and
and
are the maximum and minimum values of the corresponding original indicator, respectively.
(2) Calculation of indicator entropy:
To avoid the structural error of zero or negative values in the logarithmic calculation of informational entropy, a coordinate translation is applied by adding 1 to the standardized value
x’ij. The calculation formula for the feature probability
pij is as follows:
where
pij represents the proportion of the
i-th evaluation indicator for the
j-th evaluation factor. Then, the entropy value
Hj of the indicator is expressed as
where 0 ≤
≤ 1.
(3) Calculation of the evaluation indicator weight
:
and it satisfies
, with 0 <
< 1.
Assume that the weight of the subsystem obtained by the Entropy Weight Method is
, and the weight of the
i-th indicator within the subsystem is
. Then the comprehensive weight of the i-th indicator is calculated as
3.2. Machine Learning Models
Random Forest (RF) is a flexible and powerful ensemble learning method based on decision trees, originally developed by Breiman. The algorithm constructs multiple decision trees using random bootstrapped samples from the training dataset. Specifically, a random subset of the initial data is selected to train each binary tree, while the excluded data points are referred to as “Out-of-Bag” (OOB) samples. RF assesses the importance of variables by calculating the increase in prediction error when OOB data for a specific variable is permuted while all other variables remain fixed. Generally, the implementation of Random Forest requires the tuning of two primary parameters: the number of trees (ntree) and the number of variables sampled at each node (mtry) [
21].
3.3. Comprehensive Evaluation Model for Water Resource Carrying Capacity
The formula for the comprehensive evaluation model of Water Resource Carrying Capacity constructed in this study for Gansu Province is as follows:
where
CI is the comprehensive evaluation index of regional Water Resource Carrying Capacity;
and
are the economic Pressure Index and population pressure index, respectively;
and
are the weighting coefficients for the economic pressure index and population pressure index. In this study,
= 0.4 and
= 0.6;
and
are the Carrying Pressure Index and coordination index, respectively.
The specific meanings of each indicator in Equation (6) are as follows:
(1) Economic Pressure Index
The expression for calculating the economic pressure index supported by regional water resources is
where
is the economic pressure index; GDP is the economic output per unit of water resources; and
Em represents the maximum economic scale that regional water resources can sustain, expressed as
where
represents the gross domestic product at water consumption level
W. This is determined by analyzing regional
GDP data across different total water consumption levels and identifying the
GDP value corresponding to the minimum total water consumption.
W is the total water consumption of the socioeconomic system;
Wh is the available water resources volume.
(2) Population Pressure Index
The calculation expression for the population pressure index supported by regional water resources is
where
is the population pressure index, representing the maximum population size sustainable by water resources;
P is the current population size of the region; and
Pm is the maximum population size supportable by water resources, calculated as
where
is the lower-limit indicator of per capita
GDP for a given stage of social development. To avoid arbitrary parameterization, the lower threshold
is anchored at 3700 yuan/person, which directly references the historical baseline of the per capita low-income subsistence line established in the Gansu Development Yearbook and national historic anti-poverty monitoring frameworks at the beginning of the study matrix (circa 2000, calculated at 2000 constant prices). This specific financial baseline represents the minimum socioeconomic output requisite to sustain fundamental human living standards and basic water resource demands within the region during the early developmental stage, thereby functioning as a rigorous zero-pressure benchmark for demographic capacity modeling [
34].
(3) Carrying Pressure Index and Harmony Index
The Carrying Pressure Index
is a quantitative measure of the pressure borne by a water resource system within a certain region, under the premise of meeting the demands of socioeconomic development and the requirements of ecological environmental protection.
where
PI and
SI represent the pressure index and the Support Capacity Index of the water resource composite system, respectively. The expression for the Support Capacity Index is
where
Si and
Yi represent the weight and standardized value of the
i-th indicator in the water resources system, respectively;
n is the number of indicators. The expression for the pressure index is
where
Pi and
Yi represent the weight and standardized value of the
i-th indicator across the social, economic, and ecological environment systems, respectively;
n is the total number of indicators.
The Harmony Index (
HI) is a critical metric that characterizes the level of coordinated development within the complex system integrating water resources, the socioeconomic sector, and the ecological environment. By quantifying the coupling and coordination among these interconnected systems, an increase in the
HI value reflects improved systemic integration and enhanced efficiency in water resource utilization, whereas a decrease indicates deteriorating coordination. As a key component of Water Resource Carrying Capacity assessment, the Harmony Index, when used in conjunction with complementary indicators such as the pressure index, enables a more comprehensive evaluation of regional water resource sustainability.
where
Wi and
Yi represent the weight and standardized value of the i-th indicator in the coordination system, respectively, and
n is the number of indicators.
The grading of Water Resource Carrying Capacity (WRCC) serves to characterize the support capability of a regional water resources system. Establishing a robust grading framework is critical, as it directly influences the accuracy of the WRCC assessment. Furthermore, differentiated management strategies based on these grading results can provide precise guidance for the sustainable utilization of regional water resources. Drawing upon relevant studies [
40,
41], this study categorizes WRCC evaluation results into five distinct grades. It is important to note that an inverse relationship exists between the comprehensive evaluation index and the regional carrying status: specifically, a lower index value indicates a stronger sustainable supporting capacity. The specific classification criteria are detailed in
Table 1.
3.4. ARIMA Model
The Autoregressive Integrated Moving Average (ARIMA) model consists of three core components, namely autoregression, integration via differencing, and a moving average term. It is a classical statistical framework for time series analysis and forecasting and is particularly suited to univariate sequences with temporal autocorrelation. The model is denoted as ARIMA (p, d, q), where p is the autoregressive order that represents the dependence on prior observations, d is the differencing order applied to achieve stationarity, and q is the moving average order that captures the influence of past forecast errors. In the ARIMA framework, the observed series is treated as a stochastic process governed primarily by its own internal dynamics, with random disturbances represented by an error term. In a more general formulation, the target variable may also be affected by external explanatory factors (denoted as
X1,
X2,
X3, …,
Xk) in addition to its intrinsic temporal structure. Let
Y denote the observed values of the variable of interest and let
μ denote the random error term. The dependence of the current value on historical information can be expressed as follows:
In Equation (15),
β0,
β1,
β2, …,
βk represent the coefficients of influence of the variable’s own past changes; the error term μ exerts varying impacts across different time periods and can be expressed as
In Equation (16), α0, α1, α2, …, αk represent the error coefficients for different time periods. Based on this model, existing data can be used to forecast future values and provide a scientific foundation for formulating targeted countermeasures and recommendations.
4. Construction of the Water Resource Carrying Capacity Evaluation Index System
The Water Resource Carrying Capacity (WRCC) system is a complex, multi-level hierarchy coupled with five subsystems: water resources, society, economy, ecological environment, and the coordination system [
42]. In alignment with the current development status of Gansu Province, the selection of indicators adheres to the principles of reliability, directionality, hierarchy, and timeliness. Drawing upon relevant research, the theoretical framework of “quantity, quality, region, and flow” of water resources [
43] was fully integrated into the selection of specific indicators for each dimension. Consequently, a comprehensive WRCC evaluation index system for Gansu was established. This system covers the dimensions of water resources, ecological environment, and socioeconomy, comprising 29 indicators spanning the period from 2000 to 2023 (see
Table 2). Note that “+” denotes positive indicators (higher values are favorable), while “−” denotes negative indicators (lower values are favorable).
Before calculating weights and alternative modeling, a comprehensive multicollinearity assessment among the 29 selected indicators was conducted using Pearson correlation analysis and Variance Inflation Factor (VIF) diagnostics. The results revealed varying degrees of multicollinearity, particularly within the Water Resources Subsystem (e.g., between total water resources, surface water resources, and precipitation) and Economic Subsystem. Such correlation is inherently inevitable in highly coupled water–socioeconomic–ecology complex systems, where individual metrics represent different physical or institutional manifestations of the same hydrological cycle or regional development scale.
To prevent this information redundancy from undermining the evaluation objectivity, two explicit methodological safeguards were implemented. First, in the weight determination phase, the Entropy Weight Method (EWM) was utilized precisely to mitigate subjective bias and balance the informational overlaps by evaluating the discrete entropy value of each metric rather than assuming geometric variance. Second, during the driver identification phase, Random Forest (RF) was purposefully selected as the core machine learning model. Benefiting from its ensemble structure based on bootstrap aggregation and random feature sub-selection at each split node, the Random Forest algorithm exhibits exceptional robustness against high-dimensional multicollinearity. It effectively suppresses the variance bloating and stability degradation traditionally suffered by linear regression models under collinear indicators, thereby ensuring that the derived comprehensive index (CI) and driving factor feature importance rankings remain highly reliable and actionable for regional water resource management.
Based on the dataset for Gansu Province from 2000 to 2023, the Entropy Weight Method (EWM) was employed to calculate the weights of each indicator. The specific results are presented in
Table 3.
As exhibited in
Table 3, a prominent disparity exists among the subsystem weights, where the Water Resources Subsystem claims the highest system weight (0.342), while the Ecological Environment Subsystem receives the lowest (0.119). This distribution is strictly governed by the internal mathematical properties of the Entropy Weight Method (EWM) intertwined with the regional geographical realities of Gansu Province. Mechanistically, EWM determines weights based on informational entropy, which is inversely proportional to the degree of data dispersion and temporal variability over the 2000–2023 baseline.
The dominance of the Water Resources Subsystem stems from the severe natural hydroclimatic volatility and acute supply–demand variations characteristic of this arid and semi-arid region. Indicators such as precipitation, surface water resources, and total water supply undergo drastic interannual fluctuations and spatial stresses, creating highly dispersed data arrays that yield lower entropy and higher weights. Conversely, the low weight assigned to the Ecological Environment Subsystem does not imply its low significance in regional sustainability; rather, it reflects the high temporal stability and steady, linear progressions of its underlying indicators during the study matrix. For instance, driven by sustained regional environmental preservation efforts, metrics like water use per 10,000 yuan GDP and soil erosion control area exhibit smooth, continuous trajectories with minimal stochastic variance or disordered noise. This high sequence orderedness naturally results in a larger entropy value, automatically compressing its objective weight within the information-theory model. This outcome successfully aligns the mathematical attributes of the objective algorithm with the actual eco-hydrological evolutionary patterns of Gansu Province.
5. Assessment and Prediction of Water Resource Carrying Capacity in Gansu Province Based on Random Forest
5.1. Model Performance Evaluation and Driving Factor Analysis
5.1.1. Model Goodness of Fit and Validation Results
To systematically identify the spatiotemporal variation and driving patterns of Water Resource Carrying Capacity (WRCC) across Gansu Province, a rigorous hyperparameter optimization pipeline was executed for the Random Forest model. Specifically, a 5-fold cross-validation grid search strategy was applied independently to each of the 14 prefecture-level units to balance model flexibility and generalization. The tuned configurations for the ensemble size (trees), the minimum node size (leaf), and the number of features sampled at each split node (mtry) are summarized in
Table 4.
The minimum node size (leaf) was uniformly optimized at 1 across all prefectures, maximizing the regression trees’ sensitivity to fine-scale structural variability and capturing complex, non-linear human–water interactions. Meanwhile, the number of random splitting features (mtry) was consistently set to 10 for all municipal models. This choice aligns with the statistical convention for Random Forest regression where mtry is approximately equal to m divided by 3, where m = 29 is the total number of input indicators within our multi-dimensional evaluation system. This setting effectively suppresses spatial feature redundancy while retaining sufficient driving diversity. As documented in
Table 4, the optimized ensemble sizes (trees) varied between 100 and 400 across different prefectures, which significantly reduces prediction variance, dampens individual tree instability, and establishes a highly robust methodological foundation for cross-regional WRCC forecasting.
As shown in
Figure 3, eight prefecture-level units achieve mean R
2 values above 0.65 for both the training and test sets, with Mean Absolute Error remaining below 10 percent and Mean Bias Error close to zero. These results indicate that, despite the limited sample size, the Random Forest model maintains strong generalization after feature screening and can provide reliable WRCC predictions for a substantial portion of the study area. Nevertheless, predictive skill is notably weaker in the remaining six prefecture-level units, highlighting the pronounced heterogeneity of regional water resource systems and their coupled human–water interactions.
Three factors likely contribute to this performance gap. First, WRCC in these prefectures may be controlled by additional non-linear drivers that are not represented in the current predictor set, such as unobserved management interventions, infrastructure regulation, or localized hydrogeological constraints, which prevent the model from fully capturing the dominant dynamics. Second, the corresponding time series may exhibit stronger non-stationarity, meaning that the governing relationships between predictors and WRCC may shift between the training and testing periods due to structural changes in climate forcing, water allocation rules, or development trajectories, thereby reducing the transferability of patterns learned during training. Third, the available records may contain higher observational noise and shorter effective sequences, limiting the signal-to-noise ratio and constraining the model’s ability to extract stable and generalizable relationships from small samples. Collectively, these findings suggest that a single uniform model structure and input strategy may not yield consistently optimal performance across all sub-regions, and that region-specific predictors, non-stationarity-aware validation, or localized model adaptation may be required for robust regional-scale WRCC prediction.
The distinct spatial variations in the optimized ensemble sizes (trees) among different prefectures, as detailed in
Table 4, reflect the differing levels of system complexity and non-linear noise across Gansu Province. Specifically, Qingyang City (QY) requires a maximum of 400 trees to achieve model convergence, whereas Linxia Prefecture (LX) only needs 100 trees.
From an ensemble learning perspective, the number of regression trees governs the model capacity required to stabilize predictions and suppress variance driven by high-dimensional stochastic noise. Qingyang City is located in the Loess Plateau region, which is characterized by fractured terrain, severe soil erosion, and intensive energy-industrial extraction activities. Over the 2000–2023 period, Qingyang experienced compounded disturbances from both dramatic hydroclimatic fluctuations and strong anthropogenic interventions, such as massive ecological restoration projects and expanding mining operations. This dynamic complexity introduces high-dimensional non-linearity and strong data variance into its Water Resource Carrying Capacity matrix, which forces the Random Forest algorithm to deploy a larger ensemble of 400 trees to clear data noise and guarantee generalization.
Conversely, Linxia Prefecture features a relatively smaller territorial scale and functions primarily as an ecological buffer zone adjacent to major water conservation areas. Its socioeconomic growth and sectoral water consumption profiles have followed a highly stable, linear trajectory over the past two decades, with significantly less industrial disturbance or chaotic environmental noise. Consequently, the statistical dependencies within Linxia’s carrying capacity system are mathematically more straightforward, allowing a more compact configuration of 100 trees to effectively capture the underlying patterns without incurring computational over-fitting.
Table 5 summarizes the Random Forest evaluation results for the 14 prefecture-level cities in Gansu Province. On the training set, the model exhibits consistently strong fitting performance across all regions. Training R
2 exceeds 0.88 in every prefecture, with the highest values in Qingyang (0.9878), Dingxi (0.9868), and Tianshui (0.9814). Training MAE is generally below 0.04, and the absolute MBE is consistently smaller than 0.005, indicating high fitting accuracy with negligible systematic bias and stable model behavior.
In contrast, testing performance shows pronounced spatial variability. Tianshui achieves the best generalization, with a test R2 of 0.9051 and an MAE of 0.0462, reflecting strong predictive stability. Wuwei also performs well, with a test R2 of 0.8190. In most other prefectures, test R2 falls between 0.20 and 0.63. Jiuquan (0.5033), Pingliang (0.6340), and Baiyin (0.6313) show moderate predictive skill, whereas Jinchang (0.2410) and Dingxi (0.2028) exhibit weak generalization relative to their training performance. Two prefectures require particular attention. Jiayuguan and Lanzhou yield negative test R2 values, which implies that the model performs worse than a mean-based baseline over the testing period and points to substantial mismatch between learned relationships and out-of-sample behavior. Consistent with this, their testing MAE increases sharply, reaching 0.2672 in Jiayuguan and 0.1709 in Lanzhou. Although MBE remains within an acceptable range for most prefectures, Jiayuguan shows a comparatively large deviation with an MBE of −0.2672, suggesting systematic underprediction over the testing interval. Overall, these results indicate that the current predictor set, data quality, and validation settings support reliable prediction for several prefectures but are insufficient for others. For the low-skill cases, targeted improvements are needed, including strengthened data screening, reassessment of key explanatory variables, and prefecture-specific model adjustment to better accommodate non-stationarity and localized driving mechanisms.
5.1.2. Identification of Key Driving Factors for Water Resource Carrying Capacity and Their Contribution
To identify dominant drivers of Water Resource Carrying Capacity (WRCC) across prefecture-level cities in Gansu Province, this study evaluates predictor contributions using Random Forest feature importance, quantified by the Mean Decrease Impurity. As shown in
Figure 4, the importance ranking indicates that WRCC patterns are primarily controlled by natural water resources endowment and hydrological supply conditions, while human water use management remains influential but generally secondary at the provincial scale. Among the top indicators, total water resources (0.4632) and per capita water resources (0.4764) contribute most to WRCC prediction, which explains why water-abundant areas such as Gannan Prefecture and Longnan City tend to exhibit higher carrying capacity, whereas water-limited cities such as Jiayuguan and Jinchang are more prone to overloading. The high importance of the water production modulus (0.4056) and runoff (0.3964) further supports the dominant role of precipitation-controlled water yield and basin-scale water generation processes.
For anthropogenic indicators, total water supply shows a negative importance score (−0.0468), and the water supply modulus has low importance (0.0213). This suggests that expanding supply through groundwater abstraction or water transfer does not necessarily improve WRCC and may increase carrying pressure if it relaxes demand constraints and reduces water use efficiency. Groundwater resources rank at a moderate level (0.2034), implying that in parts of Gansu, especially the Hexi Corridor, groundwater development may be close to sustainable limits, so groundwater shifts from a buffering resource toward a constraint that elevates system vulnerability. Currently, groundwater utilization in the Hexi Corridor is heavily stressed by agricultural irrigation, which accounts for the vast majority of total abstractions and has led to localized continuous cone-of-depression expansions and eco-environmental degradation (e.g., Minqin Oasis). To mitigate these chronic risks, a stringent regulatory framework has been strictly enforced under China’s ‘Three Red Lines’ water resource policy and the revised Underground Water Management Regulations. In this water-limited corridor, rigid control mechanisms—including total extraction quotas, groundwater well-closure programs, water rights trading markets, and mandatory transition from water-intensive crops to high-efficiency water-saving agricultural matrices—have transformed groundwater governance from an expansive supply-side buffer into a binding institutional constraint, which echoes the algorithmic importance identified by our Random Forest model.
5.2. Spatiotemporal Evolution Characteristics of Water Resource Carrying Capacity
5.2.1. Temporal Series Variation Trends
Table 6 and
Table 7 summarize changes in four core indicators and the Water Resource Carrying Capacity (WRCC) index for the 14 prefecture-level cities in Gansu Province by comparing 2000 with 2023. As shown in
Table 6, the Economic Pressure Index (EpI) increases markedly in nearly all prefectures, except Gannan Prefecture and Longnan City. The most pronounced rises occur in Jiayuguan, Jinchang, Baiyin, and Lanzhou, where EpI grows by multiple orders of magnitude, indicating that industrialization and urbanization have substantially intensified the economic load on regional water resources. In contrast, the Population Pressure Index (PpI) decreases across all prefectures in 2023, suggesting a reduction in direct demographic pressure, which may be associated with population outflow, higher urbanization, and improved water use efficiency. For the Water Consumption Pressure Index (CPI), values generally declined in 2023, except in Jiuquan, Zhangye, and Wuwei, implying that water pricing reforms and water use structure adjustment have partially constrained water-intensive demand. The Harmony Index (HI) increases in most prefectures, reflecting an overall improvement in WRCC conditions. However, Lanzhou, Linxia Prefecture, Pingliang, and Longnan remain below 0.5, corresponding to a critical to mildly overloaded status that requires continued attention. To further illustrate temporal dynamics,
Figure 5 presents annual trajectories of the individual indicators together with the comprehensive index, where the vertical axis labeled CAI-WRCC denotes the Comprehensive Assessment Index for WRCC.
Based on the WRCC assessment for the 14 prefecture-level cities in Gansu Province from 2000 to 2023, WRCC exhibits clear spatiotemporal variability and an increasingly differentiated regional pattern.
Table 7 summarizes the annual evolution of the WRCC index over the study period. Overall, the provincial WRCC index shows a fluctuating upward tendency, indicating that carrying pressure has generally intensified, while spatial heterogeneity has remained strong and has become more pronounced over time. Most prefectures experience varying degrees of increase across the 23 years, consistent with an accumulating load on the regional water resources system. Spatially, the contrast between western prefectures and the southeastern part of the province strengthens, forming a pattern of higher pressure in the west and relatively favorable carrying conditions in the southeast.
Within the Hexi Corridor, industrial cities such as Jiayuguan, Jinchang, and Baiyin maintain persistently high WRCC indices and show clear upward trends. Jiayuguan records the most prominent increase, rising from 0.36 to 1.87, which indicates rapidly intensifying pressure. Within the Hexi Corridor, industrial cities such as Jiayuguan, Jinchang, and Baiyin maintain persistently high WRCC indices and show clear upward trends [
1]. Jiayuguan records the most prominent increase, rising from 0.36 to 1.87, which indicates rapidly intensifying pressure. This extreme thermodynamic structural escalation is uniquely driven by Jiayuguan’s highly specific ‘heavy-industrial path dependency’ combined with acute natural hydroclimatic limitations. Regionally, Jiayuguan possesses a highly restricted administrative territory and lacks autonomous localized large-scale surface river systems, making its water supply matrix chronically vulnerable and heavily reliant on trans-boundary water allocations from the Taolai River (a major tributary of the Heihe River Basin). From 2000 to 2023, the city experienced a massive boom in industrial production and urban concentration, anchored by giant metallurgical and steel manufacturing clusters (e.g., the Jiuquan Iron and Steel Group). This intensive industrialization path created an insatiable demand for process water, cooling water, and domestic provisioning for an expanding urban population, resulting in a severe, localized mismatch between primitive water endowment and socioeconomic expansion. Despite recent progressive shifts toward high-efficiency industrial recycling and recycling loops, the chronic physical water deficit combined with the rigid structural baseline of heavy manufacturing has systematically outpaced the ecosystem’s self-purification and supply capacities, driving the CI far into the acute Grade V severe overloading zone.
In contrast, Lanzhou exhibits a recent decline after an earlier high level, suggesting that structural adjustment and management interventions may have begun to ease carrying pressure. Southern ecological functional areas, including Gannan and Longnan, remain at comparatively low WRCC levels throughout the period, reflecting more favorable natural endowment. Nevertheless, intermittent fluctuations in some years indicate that their carrying status may still be sensitive to external hydroclimatic variability and changing development pressures. The table provides essential empirical evidence for diagnosing the spatiotemporal evolution of Water Resource Carrying Capacity (WRCC) in Gansu Province. It highlights contrasting regional trajectories under different development pathways and water resources endowments, thereby establishing a quantitative basis for subsequent driver identification and model-based prediction. Overall, WRCC in Gansu is in a critical stage in which rising carrying pressure and ecological restoration efforts coexist. The provincial trajectory does not follow a single direction of continuous improvement or deterioration, but instead shows sustained pressure with marked fluctuations and localized signs of improvement.
5.2.2. Analysis of Spatial Differentiation Pattern
To make the above patterns more intuitive, ArcGIS-based maps (version 10.7) were generated from the indicator datasets (
Figure 6). The maps compare the spatial distribution of the comprehensive WRCC index and four core indicators across the 14 prefecture-level cities in 2000 and 2023, highlighting the evolution of regional differentiation from 2000 to 2023.
Economic Pressure Index (EpI) shows the strongest polarization. Jiayuguan rises sharply and becomes the dominant high-value area, with notable increases also in Jinchang, Qingyang, and Tianshui. In contrast, Gannan and Longnan remain at consistently low levels, strengthening the contrast between high-pressure industrial areas and low-pressure ecological functional zones. Population Pressure Index (PpI) decreases across all prefectures. High values initially concentrate in Linxia and Dingxi but decline substantially by 2023. Jiayuguan and Jinchang drop to the lowest levels, and spatial differences narrow toward a generally low and more uniform pattern. Water Consumption Pressure Index (CPI) reorganizes spatially with clear hotspots. Jiuquan and Zhangye increase from medium to high levels and become major high-value areas, while Pingliang declines from the highest level to a medium level. Tianshui remains stable at a low level, and CPI high-value areas shift from dispersed to more concentrated. Harmony Index (HI) increases in most prefectures, with Lanzhou as the main exception. Lanzhou declines from a medium to a low level, whereas Jinchang, Baiyin, and Qingyang rise markedly, and Gannan, Longnan, and Linxia remain relatively stable at medium levels. Overall, the spatial diagnostics indicate intensified economic pressure in the west, widespread easing of population pressure, shifting and strengthening water use pressure, and generally improved coordination with localized stress. These results support differentiated water resource management tailored to regional endowment and pressure profiles.
5.3. Prediction of Water Resource Carrying Capacity in Gansu Province
5.3.1. Temporal Variation Characteristics Based on ARIMA Model Prediction
ARIMA is a classical time series model that is well suited to short-term forecasting when a univariate indicator shows clear temporal dependence. Building on the identified WRCC dynamics in Gansu, this study applies ARIMA to project prefecture-level WRCC trends for 2024 to 2028, as shown in
Figure 7.
Before establishing the ARIMA (p, d, q) models for the comprehensive index (CI) trend forecasting, the stationarity of each individual time series from 2000 to 2023 was verified using the Augmented Dickey–Fuller (ADF) unit root test. To remove deterministic historic tendencies and secure absolute statistical stationarity, first-order differencing (d = 1) was consistently applied to the data sequences of the prefecture-level units.
Subsequently, the optimal combinations of the autoregressive order p and moving average order q were automatically determined by minimizing the Akaike Information Criterion (AIC) values within a parameter grid search. This computational identification workflow effectively balances the goodness-of-fit and parameter parsimony. The finalized optimal configurations, along with their post-differencing ADF statistical values and AIC outcomes for all 14 units, are summarized in
Table 8 to ensure full transparency and replication of the predictive matrix.
The projections indicate increasingly divergent trajectories across Gansu. Industry-oriented cities, especially Jiayuguan and Baiyin, are expected to continue deteriorating, with Jiayuguan showing the strongest worsening. Jinchang exhibits an unstable pattern with a rise, a brief easing, and a renewed increase toward the end of the period. Lanzhou is the only prefecture with a persistent improvement trend, suggesting reduced carrying pressure under ongoing restructuring. In the Hexi Corridor, Jiuquan and Zhangye show a slow improvement, whereas Wuwei is projected to fluctuate with early deterioration followed by gradual recovery. Tianshui shows sustained deterioration, Linxia approaches the critical threshold, and Dingxi and Pingliang remain relatively stable. Gannan and Longnan also show a consistent downward tendency, implying rising pressure even in ecological functional zones. Overall, regional disparities are projected to intensify from 2024 to 2028.
To evaluate the predictive reliability of the ARIMA model for Water Resource Carrying Capacity (WRCC) across the 14 prefecture-level cities in Gansu Province during 2024 to 2028, prediction residuals were assessed using four error metrics, including Root Mean Square Error, Mean Squared Error, Mean Absolute Error, and Mean Absolute Percentage Error. Smaller metric values indicate higher predictive accuracy. Following common practice, MAPE below 15% is considered high accuracy, 15% to 20% indicates acceptable performance, and values above 20% suggest that further improvement is needed.
Table 8 reports these statistics for each prefecture.
Overall,
Table 9 shows that the ARIMA model provides acceptable forecasting accuracy at the provincial scale, but predictive skill differs substantially among prefectures. Higher reliability is obtained for Jiuquan, Jiayuguan, Zhangye, Baiyin, Tianshui, Qingyang, and Longnan, where MAPE is below, or close to, the 15% benchmark. Zhangye achieves the lowest errors across all metrics, indicating the best predictive performance. In contrast, Dingxi, Gannan, Wuwei, and Pingliang show weaker performance, with MAPE generally exceeding 20%. Dingxi in particular exhibits markedly larger errors, implying limited robustness and making it a priority for targeted optimization. The consistent spatial patterns of RMSE, MAE, and MAPE further suggest that high error cases are likely influenced by systematic mismatch rather than random noise alone, highlighting the need to re-examine prefecture-specific dynamics and refine the model structure accordingly.
Two prefectures require particular attention. Jiayuguan and Lanzhou yield negative test R2 values (−1.2050 and −0.3991, respectively), which implies that the trained Random Forest models severely underperform a simplistic mean-based baseline over the testing interval. Mathematically and hydroeconomically, these localized failures indicate that the generalized predictor set established at the provincial scale is insufficient for these specific regions, or that profound structural breaks have occurred within their water–socioeconomic–ecology nexuses.
For Jiayuguan, a typical path-dependent heavy industrial city, and Lanzhou, the provincial capital, their Water Resource Carrying Capacity dynamics have decoupled from traditional natural variations (such as precipitation and local runoff). Instead, their WRCC is strongly governed by unobserved administrative interventions, intensive water-saving technological iterations, and significant industrial restructuring. Furthermore, the commissioning of major cross-basin engineering structures, such as the Taohe River Water Diversion Project, has induced notable non-stationarity and systemic regime shifts. Consequently, the statistically stable statistical signals learned by the Random Forest model during the historical training period failed to generalize to the non-stationary out-of-sample testing window, highlighting the clear limitations of applying a uniform, generalized model structure across sub-regions with acute spatial and institutional heterogeneity.
5.3.2. Spatial Variation Characteristics Based on ARIMA Model Prediction
Using a seasonal ARIMA model, WRCC is projected for 2024 to 2028, and corresponding spatial maps are produced (
Figure 8). The forecasts reveal clear regional differentiation and a five-grade spatial hierarchy of overloading. Jiayuguan remains the most severely overloaded area, with the index increasing to 2.23 by 2028, consistent with strong industrial water demand. Baiyin and Jinchang also show an overall upward tendency, and the projected rise in Jinchang by 2028 highlights the need to improve industrial water use efficiency. In the lightly overloaded group, the Hexi Corridor agricultural prefectures, including Jiuquan, Zhangye, and Wuwei, show gradual pressure accumulation, while Wuwei is projected to peak in 2025 and then decline, suggesting potential effects of water-saving interventions. In contrast, Lanzhou shows a sustained improvement, with the index decreasing toward 0.78 by 2028, indicating reduced carrying pressure under structural adjustment. Near-overloaded areas are concentrated in southeastern Gansu, where Linxia, Dingxi, Tianshui, and Pingliang remain close to critical conditions, and Tianshui continues to worsen and approaches the lightly overloaded threshold by 2028. The well-carrying group includes Gannan and Longnan, which retain the lowest indices, although Gannan shows a slight downward tendency in capacity, implying growing sensitivity under changing hydroclimatic conditions.
6. Discussion
This study clarifies the long-term evolution, spatial differentiation, and recent trend signals of Water Resource Carrying Capacity (WRCC) in Gansu Province. The sustained increase in the comprehensive index indicates intensifying carrying pressure, which is jointly shaped by the concurrent tightening of supply-side constraints and the continuous expansion of demand-side requirements. In arid and semi-arid regions, the pronounced interannual variability in water availability exacerbates systemic uncertainty and can heavily amplify aggregate risks. Concurrently, industrial transformation and accelerating urbanization exert persistent stress through elevated withdrawal intensities, unbalanced water use efficiencies, and strengthened ecological redlines, thereby reinforcing intra-provincial spatial disparities. The western industrial corridor remains under chronic stress, reflecting a path-dependent coupling between macroeconomic development trajectories and physical water constraints, characterized by both temporal accumulation and a tight spatial concentration of pressure. Although southern ecological functional zones generally maintain a low-pressure baseline, their marked interannual fluctuations underscore a high sensitivity to external hydroclimatic disturbances, highlighting the critical need for stable, long-term ecological protection and governance.
The Random Forest architecture demonstrates remarkable stability across most prefectures, implying that the constructed multi-dimensional indicator system effectively captures the consistent statistical relationships governing WRCC dynamics. The localized weak performance observed in several prefectures likely reflects strong data non-stationarity and structural regime shifts, such as modifications in regional supply configurations, the commissioning of major cross-basin infrastructure, abrupt institutional changes in sectoral water use intensity, or marginal inconsistencies in statistical reporting. Rather than indicating model failure, these deviations are diagnostically valuable as they signal critical system transitions that require targeted geographic interpretation. From a water governance perspective, model credibility depends not only on raw predictive skill but fundamentally on interpretability and reproducibility, which serve as essential prerequisites for policy uptake [
44]. Therefore, driving factor analysis must prioritize hydrological mechanisms and management relevance, and feature importance rankings should be interpreted as diagnostic associations rather than direct causality.
ARIMA provides useful short-term trend projections, but its validity depends on stability in the underlying time series structure. Under major policy interventions or extreme events, standalone time series models may underestimate abrupt risk. A hybrid framework that combines machine learning with time series modeling can improve robustness and generalization by incorporating multi-dimensional drivers [
31,
45]. Data availability remains a key constraint in arid regions, and differences in statistical standards across jurisdictions reduce indicator comparability and model transferability [
31]. Future improvements should strengthen spatiotemporal continuity and resolution through remote sensing inversion, Internet of Things monitoring, and multi-source data fusion [
46]. In addition, the indicator system should more clearly distinguish supply components, including surface water, groundwater, and unconventional water, and disaggregate demand into domestic, industrial, agricultural, and ecological uses to better align with hydrological processes and management objectives.
Management implications support a spatially differentiated and hierarchical strategy. For high-pressure industrial cities such as Jiayuguan, Jinchang, and Baiyin, coordinated control of total water use and water use efficiency is needed, together with water-saving retrofits, industrial recycling, reclaimed water substitution, and stricter constraints on additional water-intensive projects. For Lanzhou, continued improvement should be consolidated by enhancing supply resilience and risk mitigation, including upgrades to supporting facilities and regulation storage capacity associated with inter-basin water transfer, while sustaining the benefits of industrial restructuring. For ecological functional zones such as Gannan and Longnan, ecological priority constraints should be enforced through strict limits on high-consumption and high-disturbance industries, together with ecological water allocation, non-point-source pollution control, and ecological compensation mechanisms. At the provincial scale, market-based instruments, including water rights trading and horizontal ecological compensation, should be improved to guide allocation decisions consistent with WRCC outcomes and to form a durable incentive constraint mechanism. Looking ahead, advancing WRCC assessment requires simultaneous progress in data integration and process-oriented indicators. Multi-source fusion of remote sensing, Internet of Things monitoring, and statistical records can reduce missing data and scale mismatch. In parallel, indicator design should better represent coupled water quantity and water quality constraints and embed hydrological mechanisms, geospatial processes, economic structure, and governance context into feature construction and interpretation. Such interdisciplinary integration is essential to develop a more verifiable, transferable, and decision-oriented WRCC assessment framework.
7. Conclusions
This study examines the 14 prefecture-level administrative units, encompassing both municipal cities and autonomous prefectures, across Gansu Province. Utilizing empirical water resources and socioeconomic datasets spanning from 2000 to 2023, we constructed a multi-dimensional evaluation indicator system comprising 29 distinct metrics. Objective indicator weights were derived through the Entropy Weight Method. Subsequently, a Random Forest regression model was deployed to screen dominant driving factors based on feature importance rankings. The spatiotemporal evolution of Water Resource Carrying Capacity (WRCC) was rigorously quantified using a Comprehensive Evaluation Index (CI), building upon which an ARIMA predictive framework was established to forecast WRCC trajectories for the subsequent five years. This study aims to provide a practical, management-oriented assessment matrix to optimize water allocation, strengthen regional ecological protection, and advance sustainable development transitions. The primary conclusions are summarized as follows:
(1) From 2000 to 2023, the historical WRCC dynamics across Gansu Province exhibit localized improvements and progressive water-use efficiency gains; nevertheless, the structural aggregate carrying pressure has not been fundamentally alleviated. Spatially, Gannan and Longnan maintain the most favorable status at Grade II, characterized by minor carrying pressure and substantial room for sustainable development. Zhangye, Linxia, Tianshui, and Pingliang display moderate carrying conditions. Conversely, the carrying capacities of Jiuquan, Wuwei, Lanzhou, and Dingxi have approached saturation, signifying strictly limited margins for further socioeconomic expansion under current resource constraints, while Jiayuguan, Jinchang, and Baiyin remain chronically trapped under severe overloading stress. Over the studied matrix, Longnan emerged as the most rapidly improving unit, successfully shifting from Grade III to Grade II.
(2) Pronounced and polarized regional disparities persistently endure across the province. Within the western Hexi Corridor and eastern Longdong regions, expanding volumetric water demands have driven several areas to edge toward or fully enter severe overloaded states. In sharp contrast, Gannan and Longnan in southern Gansu function as relative carrying surplus zones, benefiting from superior natural water endowments combined with lower population densities and less intensive economic pressures. Meanwhile, the central Longzhong region remains under intermediate carrying stress. Over time, the systemic coordination among water resources, ecological environments, and socioeconomic subsystems has steadily improved, with the long-term CI trajectories revealing a clear spatial pattern where one prefecture remains stable, one declines, and twelve experience varying degrees of pressure intensification.
(3) The optimized Random Forest model demonstrates highly differentiated generalization performance across different sub-regions. It achieves robust predictive skill and high reliability in Tianshui, Zhangye, and Wuwei, yielding test-set R2 values above 0.60 and mean absolute errors (MAE) below 10%, which indicates that the ensemble algorithm effectively captures the core non-linear mechanics governing WRCC dynamics in these units. However, substantial prediction deviations and negative test R2 values were observed in Jiayuguan, Lanzhou, and Dingxi. These localized limitations underscore the presence of intensive data non-stationarity, forceful administrative interventions, or exceptionally complex hydroclimatic variations, highlighting the inherent boundary constraints of applying a single uniform model across territories with profound spatial heterogeneity.
(4) The short-term predictive trajectories indicate a stepwise “western–high, eastern–low” carrying pressure gradient that tightly corresponds to the province’s physical geographic setting and macroeconomic structure. Grade V severely overloaded zones are projected to concentrate heavily within the industrial clusters of the western Hexi Corridor, represented by Jiayuguan, Jinchang, and Baiyin. Grades IV to III form a broad, continuous transitional belt slicing across central and southeastern Gansu—including Jiuquan, Zhangye, Wuwei, Lanzhou, Tianshui, and Pingliang—which mathematically functions as a critical early-warning zone for regional water security. Grade II stable areas remain confined to southern Gansu, where Gannan and Longnan continue to serve as pristine ecological security barriers with relatively favorable resource support capacities.
(5) Based on the dynamic spatiotemporal diagnostic and predictive trajectories, differentiated water resource regulation strategies must be tailored to different carrying capacity zones across Gansu Province. For the severely overloaded heavy-industrial hubs in the western Hexi Corridor (e.g., Jiayuguan, Jinchang, and Baiyin), management must shift from expansive supply-side provision toward rigid administrative demand-side control. This includes strictly executing illegal groundwater well-closure campaigns, enforcing mandatory industrial water-recycling loops, and accelerating low-water-intensity economic structural transitions. For the transitional warning belts across central and southeastern Gansu (e.g., Wuwei, Lanzhou, and Tianshui), emphasis should be placed on optimizing agricultural crop layouts by replacing water-intensive crops with high-efficiency drought-resistant varieties and standardizing water-saving drip-irrigation networks, combined with the adaptive scheduling of major inter-basin diversion projects. Finally, for the southern ecological functional barriers (Gannan and Longnan), long-term ecological compensation mechanisms should be established to preserve upstream water-yielding endowments, forbidding high-pollution socioeconomic expansion to guarantee the overall ecological safety margins of the entire province.