1. Introduction
Hydrological droughts, characterized by reduced river discharge and streamflow deficits, represent a distinct hazard from meteorological and agricultural droughts because they reflect catchment storage, routing, and human withdrawals. As a result, they determine impacts on water supply, ecosystems, navigation, and hydropower at various scales, including transboundary basins [
1,
2,
3]. Recent global assessments indicate that extreme hydrological droughts are projected to become more frequent and severe with warming, with dominant drivers shifting from precipitation to temperature in some regions [
4]. This change increases risks to water-dependent sectors and raises concerns about intergenerational exposure to water stress [
5,
6,
7]. Comparisons of multiple drought types explicitly contrast meteorological, agricultural (soil moisture), and hydrological (runoff/discharge) droughts, revealing differing trends and underscoring the need to select indices appropriate for each drought type [
8]. Because impacts such as reservoir inflows, navigation depth, and turbine heads respond to streamflow rather than precipitation deficits alone, streamflow-based indices are essential for impact-oriented drought assessment and management in international river basins [
2,
9].
Standardized indices remain the operational backbone of drought monitoring and climate change assessments because they enable comparability across regions and timescales. Canonical examples include the Standardized Precipitation Index (SPI), the Standardized Precipitation–Evapotranspiration Index (SPEI), and streamflow- or runoff-focused indices such as the Standardized Runoff/Streamflow Drought Index (SRI/SDI) [
10,
11,
12]. The SDI captures cumulative streamflow deficits and hydrological persistence directly relevant to water resource impacts, making it particularly suitable for hydrological drought diagnostics and risk quantification [
13,
14,
15]. Comparative assessments show that runoff-based indices better represent hydrological drought propagation and characteristics across scenarios than precipitation-based indices alone, especially when agricultural water withdrawals are considered [
16,
17]. Statistical standardized indices are easy to compute from modelled or observed series but can misrepresent hydrological impacts if evapotranspiration, land-surface processes, or human abstractions change. Physically based hydrological models can represent these processes and water use but introduce structural and parameter uncertainty and are computationally intensive in large multi-model ensembles [
10,
18]. Combining standardized indices with hydrological or hybrid approaches helps balance interpretability and process realism while revealing the dominant sources of projection uncertainty [
19].
Hydrological drought projection studies commonly use multi-model climate forcing frameworks (GCM–RCM ensembles and EURO-CORDEX) and Representative Concentration Pathways (RCP2.6, RCP4.5, RCP8.5, or similar SSP scenarios) to examine emission-dependent changes in drought metrics, with ensemble approaches quantifying climate forcing uncertainty [
20,
21,
22]. Regional analyses across Europe show spatially divergent low-flow trends and strong sensitivity of drought characteristics (frequency, severity, and duration) to scenario selection and climate model spread, especially for low-flow extremes, underscoring the need to propagate uncertainty from climate forcing through hydrological response [
23,
24,
25]. Studies using large ensembles show that climate model and scenario uncertainty often dominate hydrological drought uncertainty at continental scales, while hydrological model and parameter uncertainty can be critical in certain catchments and regimes, particularly where local processes and human water use are significant [
5,
26,
27]. Europe-scale hydrological simulation ensembles, driven by multiple Euro-CORDEX RCM combinations, demonstrate contrasting responses across subregions: Mediterranean and some continental zones face stronger low-flow intensification, while northern basins may show different responses. These results highlight the sensitivity of drought trends to model selection and internal variability [
25,
28,
29].
Machine learning (ML) methods have recently been applied to streamflow simulation and drought projection because they can flexibly model nonlinear relationships, memory effects, and compound drivers without explicit process parameterization [
30,
31,
32]. Random Forest, gradient boosting, and deep neural networks have been used to attribute meteorological drivers, emulate hydrological models, and project runoff or drought indices with competitive skill, especially in data-rich settings or where rapid multi-ensemble emulation is needed [
5,
33,
34,
35]. Hybrid frameworks that constrain ML projections with hydrological understanding or multi-model ensembles have proven effective at reproducing low-flow behaviour and quantifying bivariate drought characteristics (severity and duration) while reducing computational cost compared to full hydrological multi-model chains [
30,
32,
36]. The Random Forest–Copula–Factorial Analysis (RFCFA) method, for example, integrates Random Forest with copulas to predict meteorological-to-hydrological drought propagation and reveal major drivers and uncertainties [
37]. Support Vector Machine (SVM) variants optimized with metaheuristic algorithms have been used for drought-related discharge prediction and groundwater storage simulation under future RCP scenarios, demonstrating the breadth of machine learning applications in hydrological climate change studies [
38,
39]. However, machine learning approaches inherit uncertainties from climate forcing and can be sensitive to regime shifts in training data, so careful bias correction, cross-validation, and physical interpretability checks are required [
2,
40].
Europe and Central Europe show heterogeneous projected changes in hydrological drought, with Mediterranean and some continental zones experiencing stronger low-flow intensification, while northern basins may exhibit different responses, as indicated by continental assessments of low-flow indices and multi-model ensembles [
25,
29,
41]. Many basin-scale future drought studies use deterministic hydrological modelling, variable-threshold methods, or indices derived from physically based runoff simulations. These approaches are valuable but often computationally demanding and limited in ensemble size [
21,
22,
42]. In the Western Balkans, and specifically for the transboundary Sava River Basin—a major Danube tributary with significant roles in hydropower, navigation, and water supply—the literature does not provide a substantive record of machine learning-based SDI projections under RCP scenarios [
43]. This evidentiary gap aligns with broader calls for improved ML–hydrology integration, better uncertainty decomposition in drought projections, and enhanced drought early-warning systems that use impact-relevant streamflow diagnostics and scenario-consistent climate forcing in transboundary contexts [
5,
44,
45].
This study addresses the identified methodological and regional gap by applying a Random Forest framework to project the Streamflow Drought Index (SDI) in the Sava River Basin, using predictor indices (SPI, STI, and SPEI) derived from scenario-consistent climate projections under RCP2.6, RCP4.5, and RCP8.5 through 2050. Building on evidence that Random Forest and hybrid machine learning frameworks can capture nonlinear drought propagation and emulate multi-model behaviour [
32,
37], this contribution couples machine learning-based SDI forecasting with scenario-consistent GCM/RCM forcing to provide one of the first systematic machine learning assessments of future hydrological drought for this major transboundary basin. It explicitly examines low-flow extremes and uncertainty propagation from climate forcing to impact-oriented streamflow deficits [
5,
30,
36]. The approach leverages the computational efficiency and nonlinear modelling capacity of Random Forest while maintaining interpretability through standardized drought indices, supporting climate-informed water management and drought preparedness in a data-sparse, transboundary setting [
37,
45,
46]. By quantifying future hydrological drought trajectories under alternative emission pathways, this framework directly supports sustainable, climate-resilient water resources management and long-term adaptation planning in the Sava River Basin.
2. Data and Methods
We aimed to develop and apply a data-driven framework to forecast the Streamflow Drought Index (SDI) using meteorological and climatic drought indices derived under alternative climate change scenarios. Specifically, we used the SPI, STI, and SPEI, calculated from observed and projected temperatures and precipitation for three Representative Concentration Pathways (RCP2.6, RCP4.5, and RCP8.5), as predictors in a range of machine learning models to generate monthly SDI projections at the Sremska Mitrovica gauge. We then applied non-parametric trend and change-point tests to both historical and forecast SDI series (up to 2050) to identify statistically significant long-term trends and potential regime shifts, thus providing an integrated assessment of future drought hazards and their trajectories under contrasting climate scenarios.
We first assembled an observational hydrometeorological dataset. The regional climate model outputs from SMHI-RCA4 used in this study are publicly available through the EURO-CORDEX archive and the Earth System Grid Federation (ESGF) data portals. Station-based precipitation, temperature, and discharge data were obtained from the national hydrometeorological services of the Republic of Srpska, the Hydrometeorological Service of the Republic of Serbia, and the Croatian Meteorological and Hydrological Service. Monthly precipitation and air temperature data were collected from three meteorological stations: Sremska Mitrovica (Serbia), Slavonski Brod (Croatia), and Bijeljina (Bosnia and Herzegovina). Monthly river discharge data for the Sava River Basin were obtained for the Sremska Mitrovica hydrological station (
Figure 1).
For the period 1961–2020, we calculated four monthly drought indices: the Standardized Precipitation Index (SPI), the Standardized Temperature Index (STI), the Standardized Precipitation–Evapotranspiration Index (SPEI), and the Streamflow Drought Index (SDI). At a monthly resolution, this corresponds to 720 observations per index (January 1961–December 2020) for the SPI, STI, SPEI, and SDI. For each index, we first aggregated the relevant variable over a chosen accumulation window, and then standardised it relative to its historical distribution. The Standardized Precipitation Index (SPI) is defined as:
where
Pk is the
k-month accumulated precipitation and
μP,k and
σP,k are its long-term mean and standard deviation for that time scale [
9]. The Standardized Temperature Index (STI) is analogously:
where
Tk is the aggregated air temperature anomaly, allowing us to isolate thermal effects on drought. The Standardized Precipitation–Evapotranspiration Index (SPEI) uses the climatic water balance:
where
Dk represents moisture deficit/surplus;
Pk represents precipitation; and
PETk represents potential evapotranspiration, computed as:
after fitting an appropriate distribution to
Dk [
11,
19]. The Streamflow Drought Index (SDI) is defined as:
where
Vk is the cumulative streamflow volume over k months [
13]. As all indices are standardised, they share a common categorical interpretation based on thresholds along the standard normal scale, which facilitates direct comparison of drought severity and duration across meteorological, climatic, and hydrological variables [
9]. We adopted the conventional classification shown in
Table 1. These series provided a consistent basis for quantifying meteorological and hydrological drought conditions and exploring their temporal and spatial linkages within the study basin.
To analyse the influence of spatially distributed meteorological drought on discharge at the basin outlet, we developed a distance-weighted SPI referenced to the Sremska Mitrovica hydrological station. Using the geographical coordinates of each meteorological and hydrometric station, we first calculated great-circle (Haversine) distances. We then derived Inverse Distance Weighted (IDW) coefficients with a power parameter
p = 2, normalised the weights to sum to one, and applied these coefficients to combine the individual station SPI series into a single composite index (SPIIDW). We recognise that weighting stations by their distance to the outlet overemphasises local precipitation near Sremska Mitrovica and does not accurately reflect rainfall distribution across the entire upstream catchment. In this research, the SPIIDW is used solely as a basic, exploratory measure to examine how stations affect discharge, rather than as the main factor in machine learning models, which instead depend on indices from gridded regional climate simulations. Creating basin-wide, area-weighted indices from more extensive observation networks or high-resolution gridded precipitation data will be a key future step. The SPIIDW thus represents the distance-weighted meteorological signal likely to influence discharge at Sremska Mitrovica. We quantified the strength and nature of the drought–discharge relationship by calculating Pearson and Spearman correlation coefficients between discharge and (i) each station’s SPI and (ii) SPIIDW, including lagged correlations of up to three months to capture delayed hydrological responses. To assess the combined influence of all stations and address potential multicollinearity, we fitted a multiple linear regression model using the individual SPI series as predictors of discharge and evaluated model performance through the coefficient of determination (R
2) and observed–predicted scatter plots. We further visualised spatial station influence via an IDW-based weight heatmap and archived a dataset containing the original SPI series and SPIIDW. Because only three long-term meteorological stations were available for 1961–2020, the SPIIDW should be considered a first-order estimate of the basin-scale meteorological signal rather than a fully representative areal average, especially for remote headwater areas. In this study, the SPIIDW is used only for exploratory diagnostic purposes. Similar IDW-based spatial weighting as well as SPI and streamflow analyses have been applied in hydrological studies [
47,
48,
49].
To relate meteorological and climatic drought indices to hydrological drought, we developed a data-driven modelling framework using the historical SPI, STI, SPEI, and SDI series. We first performed basic quality assurance and control by checking for missing values, calculating descriptive statistics, visually inspecting the time series, and ensuring strict chronological ordering. All predictors were processed with scikit-learn pipelines using z-score standardisation, while the target variable (SDI) was already standardised. To represent catchment memory and delayed hydrological response, we initially explored contemporaneous and lagged predictors for the SPI, STI, and SPEI with lags from 0 to 5 months and removed rows with lag-induced missing values. Exploratory cross-correlation analysis showed that SPI/STI/SPEI–SDI correlations peak within 0–3 months and become negligible beyond 5 months, with longer lags introducing redundancy and risk of overfitting [
50,
51]. For the final machine learning models, we therefore restricted the predictor set to lags of 0–2 months for each index and added harmonic seasonal terms (sine and cosine of calendar month) to allow a seasonally varying index–SDI relationship while avoiding information leakage, since these terms are deterministic functions of time. We quantified linear relationships between all lagged predictors and SDI using a correlation matrix, identified for each index family the lag with the highest absolute correlation with SDI, and visualised this dependence with a correlation heatmap. We then constructed a feature matrix X, comprising all lagged SPI, STI, and SPEI variables, and a target vector y for SDI. We split the series chronologically into training (approximately 80%) and test (approximately 20%) subsets to account for temporal dependence. Because the problem is formulated as continuous regression rather than classification, we did not perform any class-balancing or resampling; all models were trained on the full observed distribution of SDI values, including both wet and dry states.
To avoid imposing a priori assumptions on the functional form of the SDI response, we evaluated a broad ensemble of machine learning models implemented within scikit-learn pipelines with standardisation. We considered: (i) ordinary and regularised linear models (LinearRegression, Ridge, and Lasso) to provide an interpretable baseline, accommodate multicollinearity, and perform embedded coefficient shrinkage; (ii) a kernel method (support vector regression, SVR) to capture smooth nonlinear responses in relatively small hydrological samples; (iii) tree-based ensemble methods (RandomForestRegressor) to model complex, higher-order interactions between indices while maintaining robustness to outliers and redundant predictors; (iv) a gradient boosting model (XGBRegressor) to exploit stage-wise additive trees, which are well-suited to representing nonlinearities and extremes; and (v) a feedforward multilayer perceptron (MLPRegressor) for flexible nonlinear function approximation. For these five machine learning models (Random Forest, XGBoost, Elastic Net, SVR, and MLP), we tuned the main complexity and regularisation hyperparameters using scikit-learn’s RandomizedSearchCV with five-fold time-series cross-validation on the training period, minimising the cross-validated mean squared error. Time-series cross-validation was implemented using scikit-learn’s TimeSeriesSplit, which preserves the chronological order of the data and uses a forward-chaining scheme: for each of the five folds, the model is trained on an expanding window of the historical record and validated on the immediately subsequent block, ensuring that validation always occurs on data that are later in time than the training data and avoiding any information leakage across folds. The tuned hyperparameters and their candidate values are summarised in
Table 2.
In this study, we used a Random Forest (RF) regression framework to model the relationship between meteorological drought indices and the Streamflow Drought Index (SDI). The Random Forest algorithm [
33] is an ensemble learning method that combines predictions from many decision trees, each grown on a bootstrap sample of the training data with random feature subsampling at each split. This structure enables RF to flexibly represent nonlinear relationships and higher-order interactions between predictors, while reducing overfitting through averaging, and it has been shown to perform robustly in hydrological forecasting applications [
52]. The predictors supplied to the RF model were the contemporaneous and lagged values (lags 0–2 months) of the SPI, STI, and SPEI, together with harmonic seasonal terms derived from the calendar month (month_sin and month_cos), which collectively describe short-term meteorological conditions and their seasonal modulation.
To achieve a parsimonious yet accurate configuration, we optimised key RF hyperparameters using a random search strategy with time-series cross-validation. Specifically, we considered the number of trees in the ensemble (n_estimators ∈ {200, 300, 500}), the maximum depth of each tree (max_depth ∈ {None, 4, 6, 8}), the minimum number of samples required at a leaf node (min_samples_leaf ∈ {1, 2, 4}), and the number of predictors considered at each split (max_features ∈ {‘auto’, ‘sqrt’, 0.5}). Hyperparameter combinations were evaluated using a five-fold TimeSeriesSplit scheme on the training period, with performance scored by the negative mean squared error to approximate minimisation of RMSE. The configuration yielding the lowest cross-validated error was then refitted on the full training dataset (1963–2009) to obtain the final global RF model. For interpretability, we also extracted feature importance rankings from the fitted RF model to assess the relative contribution of each lagged drought index and the seasonal terms to SDI prediction (reported in
Section 3). To address systematic differences between simulated and observed SDI, we used a linear bias correction (Eq. 6) on each model’s output. During calibration, the mean and standard deviation of the simulated SDI were adjusted to match observations, and these parameters were then used to post-process hindcasts and future projections. Since the predictors are standardised indices (SPI, STI, and SPEI), which already reduce much of the mean and variance bias in climate forcing, this step primarily corrects residual statistical bias in the index–SDI relationship. For future scenarios, it is assumed that this bias remains roughly constant over time, a limitation we recognise given strong non-stationarity. The linear bias correction equation is as follows:
where
μQ,
σQ and
,
are the mean and standard deviation of the observed and simulated SDIs, respectively. Model skill was quantified during the independent test period (2009–2020) using MAE, RMSE, the coefficient of determination (R
2), the Nash–Sutcliffe efficiency (NSE), and the Kling–Gupta Efficiency (KGE), providing complementary perspectives on error magnitude, variance explained, and hydrological realism. The bias-corrected RF model was then driven with projected SPI, STI, and SPEI time series for 2021–2050 to generate SDI forecasts for scenario analysis.
For the climate change component, we used the Swedish regional climate model, SMHI-RCA4, implemented on a 12.5 × 12.5 km grid, as the primary source of climate projections. Although this resolution is not the highest available, we selected RCA4 because it provides dynamically consistent simulations for all three forcing scenarios (RCP2.6, RCP4.5, and RCP8.5), ensuring methodological coherence in this initial evaluation. Relying on just one RCM means that uncertainty from the climate model structure and internal variability in climate forcing are not accounted for. As a result, the SDI projections shown should be viewed as conditional on the SMHI-RCA4 realization, rather than as a comprehensive ensemble of potential outcomes. Nevertheless, SMHI-RCA4 has been extensively applied and evaluated within the EURO-CORDEX and CORDEX frameworks, demonstrating robust performance for temperature and precipitation over Europe and other regions [
20,
53,
54,
55]. From these simulations, we derived future SPI, STI, and SPEI series for each RCP and used them as inputs to the selected bias-corrected model to produce scenario-specific SDI projections at Sremska Mitrovica up to 2050.
3. Results
This section first characterises the historical behaviour of the meteorological and hydrological drought indices, and then examines their statistical relationships. We next compare the skill of alternative modelling approaches, focusing on the optimised Random Forest model. Finally, we present projections of the Streamflow Drought Index under future climate scenarios and discuss associated uncertainties.
Figure 2 shows the temporal evolution of the standardized precipitation (SPI), temperature (STI), precipitation–evapotranspiration (SPEI), and streamflow drought (SDI) indices for 1961–2019. All four series display pronounced month-to-month variability and recurrent negative excursions, indicating frequent meteorological and hydrological drought episodes. Periods with strongly negative SPI and SPEI values generally coincide with negative SDI, suggesting clear propagation of meteorological deficits into streamflow drought. Conversely, positive SDI peaks often follow sustained wet anomalies in SPI and SPEI. STI shows both warm and cold extremes, but its correspondence with the SDI is visually weaker than that of the SPI and SPEI, foreshadowing the correlation and feature importance results presented below.
Table 3 shows that all indices are approximately standardised, with means near zero and standard deviations close to one. Quartiles are slightly negative, indicating a modest tendency toward drier-than-normal conditions. The relatively large positive maxima, particularly for the SPEI and SDI, highlight episodes of pronounced wetness and high flows superimposed on this generally variable regime.
The correlation analysis between the SDI and the lagged hydro-climatic indices shows a clear dominance of the SPI (
Figure 3). The SDI has the strongest linear association with the SPI at a one-month lag (SPI_lag1, r ≈ 0.50), followed by the contemporaneous SPI (SPI_lag0, r ≈ 0.44) and the two-month lag (SPI_lag2, r ≈ 0.24). In contrast, correlations with the STI and SPEI at all lags remain weak (|r| ≲ 0.05), indicating that temperature-based and combined indices contribute little to the linear variability of the SDI in this basin. Overall, the results suggest that short-term precipitation anomalies with a lag of 0–1 month are the primary linear drivers of streamflow drought.
To examine how the Random Forest exploits the different predictors, we computed feature importance rankings for all lagged SPI, STI, and SPEI variables, as well as for the seasonal harmonics. The resulting feature importance rankings (
Figure 4) confirm that the short-lag SPI, particularly the SPI at 1-month and contemporaneous lag, dominates the RF predictions, which is consistent with the correlation analysis. In contrast, the STI and SPEI lags contribute only marginal additional importance, and the seasonal terms play a secondary role by modulating the strength of the SPI–SDI linkage across the year.
Table 4 shows that before bias correction, all models achieve comparable MAE and RMSE, with Random Forest (RF) slightly outperforming the others and explaining the largest share of variance (R
2 ≈ 0.47). The key distinction emerges after bias correction is applied. RF attains the lowest errors (MAE = 0.62, RMSE = 0.83) and the highest skill (R
2/NSE ≈ 0.49). Its KGE increases from 0.25 to 0.65, indicating a marked improvement in correlation, bias, and variability. XGBoost ranks second, with slightly higher errors and lower KGE (0.48). In contrast, Elastic Net, SVR, and MLP exhibit notably worse performance after bias correction, with reduced R
2 values and strongly negative KGE values, indicating poor reproduction of the observed distribution. The decline in KGE for Elastic Net, SVR, and MLP after bias correction reflects the Kling–Gupta Efficiency metric’s design and the error patterns of these models. Linear scaling aligns the mean and standard deviation of the simulated and observed SDI during training, thereby enhancing the variability and bias components of KGE for Random Forest, where these errors dominate. Conversely, Elastic Net, SVR, and MLP tend to have smaller mean/variance biases but more significant structural or timing errors. Applying a linear transformation trained on the training data to the independent test period can reduce the correlation component of KGE without substantial improvements in bias or variability, resulting in a lower overall KGE. On this basis, RF is clearly the most suitable model for SDI forecasting in this study, providing the best combination of accuracy, explained variance, and realistic hydrological behaviour.
The results presented in
Figure 5 confirm the quantitative ranking reported in
Table 3. For all models, points cluster around the 1:1 line, indicating reasonable skill in reproducing SDI variability. However, the Random Forest (RF) plot shows the tightest cluster and the least systematic deviation from the line, especially in the range −1 ≤ SDI ≤ 2, which is consistent with its lowest MAE/RMSE and highest R
2/NSE and KGE after bias correction. XGBoost performs slightly worse, with a broader scatter, while Elastic Net, SVR, and MLP exhibit greater dispersion and bias at higher magnitudes. Thus, the visual diagnostics corroborate RF as the best-performing SDI forecasting model.
Figure 6 compares observed and Random Forest-predicted SDI for the independent test period (2009–2020). The model reproduces the timing and signals of most wet and dry episodes well: positive and negative bars closely align each year, indicating that Random Forest correctly captures the onset and duration of drought and recovery phases. Amplitudes of moderate events are also reasonably matched, while some of the highest positive SDI peaks and deepest negative values are slightly underestimated, reflecting the remaining unexplained variance. Overall, the visual agreement is consistent with the quantitative metrics (NSE ≈ 0.49, KGE ≈ 0.65), confirming good predictive skill.
Based on the comparative performance evaluation of all candidate algorithms, the Random Forest (RF) model emerged as the most suitable approach for Standardized Drought Index (SDI) forecasting and was therefore selected for all subsequent analyses. The final RF configuration, trained on the historical SPI, STI, and SPEI predictors, was combined with the previously estimated bias-correction parameters and then applied to the projected SPI, STI, and SPEI time series derived from climate scenarios RCP 2.6, RCP 4.5, and RCP 8.5. This procedure yielded bias-adjusted monthly SDI projections for 2021–2050, ensuring internal consistency between the training and projection phases while reducing systematic errors inherited from the driving climate simulations. The resulting SDI time series provides a robust basis for characterizing the timing, persistence, and severity of future drought conditions under alternative emissions pathways, and for supporting climate-informed water resources planning and risk management.
The SDI projections for 2021–2050 show a clear intensification of hydrological drought with increasing radiative forcing from RCP 2.6 to RCP 8.5 (
Figure 7). Under the low-emission RCP 2.6 scenario, negative SDI values (SDI < 0) occur frequently, but most events remain within the mild to moderate drought range (−1 < SDI < 0), with fewer months crossing the SDI ≤ −1 threshold associated with moderate to severe hydrological drought. These deficits are typically interspersed with short recovery phases, indicating that under strong mitigation and a global warming limit of about 1.5–2 °C, the basin retains some resilience and the drought regime, while more variable than in a stationary climate, remains comparatively manageable. Under the intermediate stabilization pathway RCP 4.5, drought characteristics change markedly: negative SDI values become more persistent, and excursions below −1 and −1.5 are more frequent and tend to cluster in multi-year sequences, suggesting longer recovery times and increasing cumulative flow deficits. The high-emission RCP 8.5 pathway, representing unmitigated warming of around 5 °C by 2100, exhibits the most pronounced changes. Here, extended periods dominated by strongly negative SDI values appear, with repeated episodes below −1.5 indicating severe to extreme hydrological drought. Simultaneously, the frequency of high positive SDI peaks increases, suggesting a more volatile flow regime with amplified wet and dry extremes. These results demonstrate a strong link between rising greenhouse gas concentrations and drought hazard: as scenarios shift from RCP 2.6 to 4.5 and 8.5, droughts in the basin become more frequent, severe, and persistent, with significant implications for water resources planning, storage operation, and drought risk management under future climate conditions.
Figure 8 presents the distribution of monthly SDI values forecast by the bias-corrected Random Forest model for 2021–2050 under the three RCP scenarios. Under RCP2.6, the density is centred slightly below zero, with a long right tail, indicating that mildly dry conditions dominate, but occasional wet anomalies are projected. RCP4.5 exhibits a broader spread and a more negative mode, suggesting a shift toward more frequent and intense dry states, while still allowing for intermittent wet periods. In contrast, RCP8.5 shows a marked shift of the distribution toward positive SDI values, with the highest densities between −1 and 1, which is consistent with generally drier future conditions under this high-emissions pathway. The tails in all cases indicate non-negligible probabilities of both severe drought and wet extremes. Overall, the figure highlights substantial scenario-dependent differences in the balance between dry and wet states, underscoring the importance of emissions trajectories for future hydrological drought risk.
Figure 9 presents hydrostripes of the Streamflow Drought Index (SDI) for 2021–2050 under RCP2.6, RCP4.5, and RCP8.5, with anomalies expressed relative to the WMO reference period, 1961–1990. The visualization reveals clear scenario-dependent differences in both the frequency and magnitude of streamflow anomalies. Under RCP2.6, the SDI values fluctuate around the historical mean, with alternating wet and dry years and no persistent dominance of negative anomalies, indicating a relatively stable low-flow regime despite occasional drought years. The RCP4.5 scenario shows a similar pattern but with a slightly higher frequency of positive SDI anomalies and fewer pronounced negative departures, suggesting comparatively wetter average conditions and reduced drought severity over the projection period. In contrast, RCP8.5 exhibits markedly different behaviour, with stronger amplitudes and a higher frequency of negative SDI anomalies, especially from the late 2020s onward. This pattern indicates increased occurrence and intensity of hydrological droughts, along with greater interannual variability. The predominance of negative SDI values under RCP8.5 aligns with stronger radiative forcing, which increases atmospheric evaporative demand and alters precipitation regimes, intensifying low-flow conditions. Overall, the hydrostripes indicate that among the scenarios assessed, RCP8.5 results in the driest future conditions, while RCP4.5 moderates drought risk relative to both RCP2.6 and RCP8.5. These findings highlight the sensitivity of future streamflow drought characteristics to emission pathways and emphasize the importance of incorporating scenario-based SDI projections into long-term water resources planning and climate adaptation strategies.
4. Discussion
The strong dependence of the Streamflow Drought Index (SDI) on short-lag SPI (0–1 months) reflects rapid meteorological-to-hydrological transmission in temperate, precipitation-driven catchments and the limited influence of longer-term thermal indices. Short lags correspond to rapid runoff response and limited catchment memory in many large European basins, where soil moisture and fast flow pathways dominate early streamflow deficits [
56,
57,
58]. Several studies report that runoff often responds within weeks to precipitation anomalies and that only a small fraction of precipitation droughts develop into prolonged hydrological droughts, while temperature-driven evaporative demand influences drought persistence rather than immediate onset [
59,
60,
61,
62]. Catchment attributes such as baseflow index and groundwater release determine memory effects and modulate short- versus long-lag relationships, explaining why the SPI at 0–1 months outperforms longer aggregation times and why indices that include potential evapotranspiration (SPEI, STI) play a secondary role in initial SDI formation in this region [
19,
59,
63]. The RF feature importance analysis (
Figure 4) reinforces these results: most predictive power comes from the SPI at lags 0–1 months, whereas the STI and SPEI contribute little additional skill. Including STI and SPEI in the predictor set therefore serves mainly to test for potential nonlinear or seasonal effects rather than to drive the forecasts. Given their limited importance, the SDI projections should be interpreted primarily as being driven by short-term precipitation anomalies (SPI), with temperature and combined indices playing at most a minor supplementary role.
Random Forest (RF) outperformed XGBoost, Elastic Net, SVR, and MLP for SDI forecasting because ensemble trees naturally capture complex nonlinearities and higher-order interactions without extensive feature engineering. RF’s bootstrap aggregation reduces variance, implicitly manages correlated predictors, and is robust to multicollinearity, which improves prediction when many lagged climate indices compete for explanatory power [
33,
34]. Comparable applications across European and regional basins document RF’s consistent skill advantages for streamflow and drought metrics, often exceeding or matching gradient-boosted and neural methods, especially after careful bias correction and ensemble approaches [
64,
65]. The achieved performance metrics align with contemporary machine learning drought studies, which report moderate explanatory power for monthly hydrological droughts in large basins, underscoring the value of tree ensembles for operational drought forecasting [
30,
32,
62,
66].
The marked improvement in Kling–Gupta Efficiency after bias correction highlights the importance of removing systematic errors from climate drivers before inputting them into statistical models. Bias correction addresses distributional mismatches that would otherwise propagate through climate impact chains, reducing conditional biases that machine learning models can amplify and improving the representation of tails and seasonal cycles that influence drought onset and recovery [
25,
67]. Recent impact-chain studies emphasise that bias correction or post-processing of GCM/RCM outputs materially improves hydrological low-flow representation and machine learning forecast reliability, making post-processing an essential step for credible drought projections and operational forecasting frameworks [
25,
56,
57,
68].
The scenario results showing intensified, more persistent, and more volatile SDI deficits from RCP2.6 to RCP8.5 are consistent with ensemble projections that amplify low-flow signals under higher warming. Higher emissions increase evaporative demand and shift precipitation regimes, promoting multi-year drought clustering, longer recovery times, and larger deficit volumes. This non-stationarity alters drought frequency, duration, and severity distributions, as documented across Europe and Central to Southeast Europe [
23,
69,
70]. Several multi-model studies report a southwest–northeast contrast, with Mediterranean and southern basins experiencing amplified low flows and increased probability of multi-year events under RCP8.5, reinforcing scenario-dependent risk escalation and earlier onset of unprecedented drought conditions in vulnerable regions [
23,
24,
27,
28]. The observed multi-year clustering reflects compound feedbacks between soil moisture depletion, reduced baseflow recharge, and persistent atmospheric blocking patterns, which become more frequent under higher radiative forcing [
29,
41].
For transboundary basins such as the Sava, the results indicate a need to adapt reservoir operating rules, revise drought contingency plans, and strengthen international coordination to manage clustered multi-year deficits and tail risks [
3]. However, key limitations persist: projections are driven by a single RCM (SMHI-RCA4), so climate forcing uncertainty and model spread are not explicitly quantified; the results therefore reflect one plausible realisation rather than a full ensemble envelope. In addition, projections depend on bias-correction choices, extreme SDI tails remain uncertain, and data-driven models cannot fully resolve process changes or unprecedented states [
25,
56]. Additionally, our linear-scaling bias correction for SDI predictions presumes that the bias in the learned index–SDI relationship remains constant over time. If significant future changes occur in the hydrological regime, this assumption may not hold, meaning the bias-corrected SDI projections should be viewed as conditioned on the current calibration of the mapping, rather than as an entirely accurate correction for all future distribution changes.
Furthermore, we did not include classical univariate time-series models like ARIMA or STL-based methods as primary benchmarks. Although these approaches are well-established and often competitive for short-term forecasts, they do not naturally integrate the exogenous, scenario-aligned drought indices (SPI, STI, and SPEI) that underpin our impact-chain projections. Instead, they tend to extrapolate SDI solely from its own past data. In our context, linear and regularized regression models with lagged indices, such as Elastic Net, already serve as simple, interpretable baselines that capture AR-like structures while remaining compatible with climate-scenario forcing. Nevertheless, incorporating explicit ARIMA-based benchmarks in future work could enhance the comparison and help quantify the added value of index-based machine learning models over purely autoregressive approaches.
Integrating process-based hydrological models, increasing ensemble sizes, characterising tail uncertainty through stochastic approaches, and coupling machine learning with physical constraints are priority research directions to reduce uncertainty and inform adaptive water allocation and ecosystem resilience strategies [
18,
36,
46,
57,
71,
72].
5. Conclusions
This study developed and evaluated a data-driven framework for forecasting the Streamflow Drought Index (SDI) using meteorological drought indices. Analysis of the 1961–2020 record showed that all indices are approximately standardised, with frequent negative excursions indicating recurrent drought episodes. Correlation and feature-based diagnostics consistently identified short-term precipitation anomalies as the dominant driver of hydrological drought: the SPI at lags of 0–1 months exhibited the strongest relationships with the SDI, whereas temperature-based and combined indices (STI, SPEI) contributed little additional explanatory power.
A suite of machine learning models was compared, including Random Forest, XGBoost, Elastic Net, support vector regression, and a multilayer perceptron. After chronological train–test splitting and linear-scaling bias correction, Random Forest clearly outperformed the alternatives, achieving the lowest errors (MAE ≈ 0.62, RMSE ≈ 0.83) and the highest skill (NSE ≈ 0.49, KGE ≈ 0.65) on the independent test period. The Random Forest model reproduced the timing and signals of most wet and dry events and captured the magnitude of moderate droughts reasonably well, although the most extreme peaks were still underestimated. These results highlight the suitability of ensemble tree methods for hydrological drought prediction when only index-based predictors are available.
The optimised and bias-corrected RF model was subsequently driven with SPI, STI, and SPEI projections from three RCP scenarios to generate monthly SDI forecasts for 2021–2050. The projections show a clear scenario dependence: under RCP2.6, mild to moderate droughts remain frequent but short-lived; under RCP4.5, negative SDI values become more persistent and cluster in multi-year episodes; and under RCP8.5, severe and prolonged hydrological droughts dominate, with increased interannual variability. These findings highlight the sensitivity of future streamflow drought regimes to emission pathways and indicate heightened drought risk under high-warming scenarios.
Despite these advances, uncertainties persist due to the reliance on SPI, STI, and SPEI predictors, climate model and bias-correction choices, and limited information on unprecedented extremes. Future work should combine process-based hydrological models with machine learning, expand climate ensembles, and explicitly quantify tail uncertainty to better support adaptive reservoir operation and transboundary drought management. Taken together, these advances would further strengthen the use of SDI projections as a tool for sustainable water allocation, drought risk reduction, and implementation of climate-resilient management strategies in the basin.