The Case study is increasingly exposed to drought hazards driven by the convergence of anthropogenic pressures and climate change. This research contributes a multi-faceted framework integrating historical climate observations (1985–2014), machine learning-based feature selection, deep learning for predictive modeling, and future scenario analysis (SSP2-4.5 and SSP5-8.5 for 2030–2050). The findings offer insights into drought evolution’s spatial and seasonal characteristics, the relative importance of climatic variables, and the operational potential of advanced modeling techniques. In what follows, key outcomes are synthesized with recent literature and discussed in light of their implications for resilience-building and policy development.
4.1. Spatiotemporal Evolution of Droughts: Patterns and Drivers
The historical PDSI record indicates persistent drought stress, with the early 2000s emerging as a period of intensified hydroclimatic imbalance. These patterns are consistent with regional climate diagnostics across northern China, highlighting increased precipitation deficits and warming trends [
54]. The 5-year rolling average of PDSI reveals prolonged negative phases, reinforcing evidence of decadal-scale drought intensification [
55]. This trajectory aligns with broader observations in semi-arid systems globally, suggesting structural shifts in water balance dynamics.
The variable importance analysis confirmed precipitation as the most influential driver, with solar radiation and maximum temperature acting as compounding stressors by intensifying evapotranspiration losses. Similar findings have been reported in the Colorado River Basin and South Asia, where compound drought–heat events were driven by moisture deficits interacting with elevated atmospheric demand [
56]. These findings highlight the need for drought assessments that account for supply-side (precipitation) and demand-side (temperature, radiation) pressures.
In this context, the choice of historical climate inputs from the TerraClimate dataset significantly enhanced the reliability of the forecasting framework. Including precipitation, minimum, maximum, and mean temperatures, together with solar radiation, allowed the models to capture key drought-related dynamics. These included soil moisture deficits, evapotranspiration anomalies, and radiative forcing. These variables are not only physically consistent with the processes underlying the Palmer Drought Severity Index (PDSI) but also play distinct roles in shaping the water and energy balance that drives drought evolution. For instance, precipitation directly governs moisture availability, while temperature and radiation modulate atmospheric demand and surface energy fluxes. The high spatial resolution (~4 km) of TerraClimate provided a detailed view of spatial heterogeneity in the Inner Mongolia watershed. This resolution was critical for bias-correcting coarse GCM outputs and aligning modeled climatology with observed drought patterns. Therefore, the relevance of each variable extended beyond input diversity, offering a mechanistic linkage between climatological drivers and drought predictability in arid and semi-arid environments.
4.2. Modeling Drought: Superiority of Deep Learning and Uncertainty Awareness
The benchmarking of ML models demonstrated that Random Forest (RF) performed better than gradient boosting and SVR in consistency. However, the LSTM model achieved the highest accuracy overall, with R
2 = 0.766 and CC = 0.880, underscoring its strength in capturing non-linear and temporally embedded dependencies. These results echo recent findings [
57], that observed similar advantages in complex hydrological systems.
Further, uncertainty visualizations and residual diagnostics revealed that LSTM maintained predictive stability even during compound or multi-season drought periods—an aspect where other models struggled. While RF provided interpretability through feature rankings, it lacked sequential learning capabilities. Integrating downscaled CMIP6 climate scenarios into the LSTM framework enabled robust stress testing under plausible future conditions, reinforcing its suitability for early warning applications. Overall, the LSTM model emerged as a resilient and generalizable forecasting tool.
While the LSTM architecture demonstrated superior performance, its results are subject to structural and parametric uncertainty. Sensitivity analysis showed that changing the number of hidden units or input lag windows (3–9 months) altered R2 by less than ±0.05, confirming model robustness. Furthermore, 5-fold cross-validation was used to mitigate overfitting, and all reported evaluation scores are mean values across folds. Although hyperparameters (e.g., dropout rate, learning rate) were tuned using RandomizedSearchCV, some residual uncertainty remains due to stochastic training processes. Future studies could explore Bayesian neural networks or ensemble deep learning to quantify these uncertainties further.
To assess the robustness of the LSTM model, additional spatially stratified tests, where specific sub-regions of the Yellow River Basin were excluded during training, revealed that LSTM performance remained stable across zones (R2 between 0.72 and 0.78), confirming spatial generalization. Seasonal disaggregation showed higher predictability during summer and autumn, when climatic signals such as evapotranspiration and precipitation variability are stronger. In contrast, traditional ML models like SVR, XGBoost, and GBR underperformed due to their limited capacity to capture long-term temporal dependencies. SVR was particularly sensitive to input non-stationarity, while GBR tended to overfit to short-term fluctuations. These findings highlight the comparative advantage of deep learning architectures for modeling drought phenomena characterized by multi-seasonal memory and complex temporal structure.
4.3. Future Droughts Under SSP Scenarios: Severity and Seasonal Sensitivity
Scenario-based projections unveiled a marked divergence between pathways. SSP5-8.5 consistently produced more frequent and severe droughts beyond 2040, with extreme negative PDSI values and broader uncertainty bands, signaling the possibility of megadrought regimes similar to those identified in the western United States [
19]. While SSP2-4.5 projected comparatively moderate outcomes, it still revealed a non-trivial risk of spring and summer drought events.
The seasonal disaggregation of projected drought risks underscored summer and autumn as the most vulnerable periods, driven by reduced monsoon effectiveness and peak evapotranspiration. This is aligned with trends documented by Yu and Zhai [
58], who noted increasing occurrences of compound drought–heat extremes across northern China. In addition, our results identified a growing propensity for spring droughts following dry winters, indicating cascading seasonal drought dynamics. These outcomes emphasize the need for seasonal-scale planning and anticipatory action.
While the scenario-based projections provide valuable insights into potential drought evolution, they are inherently subject to multiple layers of uncertainty. These include structural differences among the GCMs used, internal climate variability not captured by deterministic models, and the assumptions embedded within the SSP narratives themselves (e.g., socio-economic development, emission trajectories). Although bias correction aligns modeled and observed distributions, it cannot fully remove uncertainty in extreme events. Such uncertainties should be carefully considered in decision-making processes, especially when designing long-term drought preparedness strategies. Probabilistic scenario planning, ensemble modeling, and stakeholder-informed thresholds can help mitigate the risks of overconfidence or maladaptation arising from deterministic interpretations of future projections [
59,
60].
4.5. Toward Resilience: Integrated Strategies for the Case Study
Systemic resilience in the YRB requires a tiered strategy that bridges anticipatory analytics and adaptive governance. LSTM-based early warning systems, informed by SSP-specific projections, offer a foundation for proactive drought response. Adaptive measures—such as drought-tolerant cropping, micro-irrigation, and conjunctive water use—can be deployed locally, while long-term resilience demands basin-wide reallocation frameworks and cross-sector coordination.
A Climate Resilience Action Matrix (
Table 10) has been developed to operationalize these insights. This framework maps strategic actions across multiple domains—from data infrastructure to institutional integration—guiding prioritization at regional and national levels. Each intervention is aligned with projected drought intensities, implementation scales, and key stakeholders.
While the modeling framework presented here is robust, some limitations remain. Socio-economic dynamics, groundwater interactions, and multi-hazard compounding effects were beyond the current scope but merit inclusion in future work. Expanding toward coupled human-climate systems and evaluating cascading impacts on food, energy, and ecosystems will further enrich drought resilience science in the basin.
Although socio-economic variables and groundwater storage significantly shape drought vulnerability and adaptive capacity, this study intentionally focused on biophysical and climatic drivers derived from the NASA NEX-GDDP dataset. To ensure temporal consistency and spatial compatibility with our downscaled climate projections, we limited model inputs to variables consistently available across historical and future periods. Consequently, key determinants such as land tenure, livelihood sensitivity, or aquifer conditions could not be explicitly incorporated.
In particular, groundwater dynamics—which are essential in semi-arid basins like the Inner Mongolia section of the Yellow River Watershed—were excluded due to the unavailability of high-resolution, spatially continuous well-level or recharge datasets. While satellite-derived GRACE products offer valuable insights into terrestrial water storage anomalies, their coarse resolution and integrated nature (encompassing soil moisture, surface water, and groundwater) limit their use for localized drought prediction. Nonetheless, our correlation analysis with GRACE_CSR shows a moderate but statistically significant relationship with LSTM-modeled PDSI, suggesting some capacity to reflect subsurface variability. Future efforts should incorporate socio-hydrological and institutional dimensions, as well as validation against independent datasets such as soil moisture, streamflow, and vegetation indices (e.g., NDVI, VCI), to enhance the interpretability and robustness of deep learning–based drought projections.
In addition to the baseline hybridization strategy, this study introduces several region-specific advancements that enhance its practical relevance and scientific contribution. First, feature selection via Random Forest importance scores was used to isolate predictors that are particularly influential in the Inner Mongolia segment of the Yellow River Basin—precipitation, maximum temperature, and solar radiation—which capture the dominant hydroclimatic drivers in this semi-arid region. Second, the six-month input window architecture of the bidirectional LSTM explicitly accounts for seasonal lags and drought memory in temperate-continental climates. Third, by applying quantile mapping bias correction to the NEX-GDDP projections under SSP2-4.5 and SSP5-8.5, the framework improves generalization under non-stationary climate change. Fourth, validation against independent indicators, including SPEI, SPI, and GRACE-based groundwater anomaly (GRACE_CSR), demonstrates that the proposed framework captures both surface and subsurface hydrological variability, enhancing robustness and interpretability.
Comparable hybrid models reported in the recent literature—such as LSTM-CM [
13], which combines LSTM and climate model outputs to reduce bias and improve drought detection accuracy at multiple lead times, and CEEMD-LSTM [
63], which decomposes signal variance to boost SPI forecasting skill—reinforce the value of our architecture in reducing uncertainty and enhancing forecast fidelity in similar dryland contexts.
The quantile mapping (QM) technique corrected biases across variables and seasons effectively but applying it under non-stationary climate conditions requires caution. QM assumes that the relationship between modeled and observed distributions remains stable over time; however, under strong radiative forcing scenarios like SSP5-8.5, this assumption may not fully hold—especially for extreme events. Although recent studies have shown that QM, when applied to statistically downscaled and high-resolution datasets such as NEX-GDDP, remains robust even under evolving conditions [
45,
47], the method may still underestimate shifts in climate variability. Future work could explore adaptive bias correction techniques or trend-preserving methods to address this limitation more rigorously. Nonetheless, our evaluation shows that the adopted QM approach provides sufficient fidelity for near-future drought scenario modeling in the Inner Mongolian context.
The Yellow River Basin faces heightened drought exposure in all future scenarios. However, a scientifically informed, scenario-aware, and globally contextualized approach to adaptation presents a viable path forward. Anticipatory planning, supported by AI-based modeling and policy innovation, can turn risk into resilience.