1. Introduction
In recent years, the frequency and severity of extreme climate events such as droughts and floods have increased worldwide. This situation is causing serious harm to humans and is expected to continue in the future, along with climate change [
1,
2]. There are particularly rising risks to the environment, economy, and society, all of which depend on water resources.
Drought is defined as a prolonged condition of water scarcity caused by precipitation deficits and intensified atmospheric evaporative demand driven by elevated temperatures [
3]. It is a natural phenomenon that adversely affects land, water resources, and production systems, leading to significant hydrological imbalances [
4]. Monitoring drought is a critical process that provides early warning under adverse climate conditions such as reduced water resources, declines in agricultural production, ecosystem imbalance, and an increase in extreme weather events like floods—thus helping to mitigate their impacts [
5,
6,
7]. More than 20 drought indices have been developed in the literature to enable the monitoring and evaluation of different dimensions of drought (meteorological, hydrological, agricultural, and socioeconomic), some of which are commonly used at different stages of the hydrological cycle [
1,
2]. Among these indices, the Standardized Precipitation Index (SPI), Standardized Precipitation Evapotranspiration Index (SPEI), Palmer Drought Severity Index (PDSI), and Standardized Runoff Index (SRI) are widely referenced and applied in regional and global studies [
8,
9]. These four core indices provide complementary information on the identification of different types of droughts, forming an important scientific basis for regional water resource management, early warning systems, and drought risk planning.
Several widely used drought indices differ in terms of their input variables and the aspects of drought they represent. The SPI characterizes meteorological drought based solely on precipitation data, whereas the SPEI also incorporates evaporation and temperature effects, thus providing a more comprehensive reflection of the impacts of climate change [
9,
10,
11]. The PDSI is effective in assessing the severity and duration of long-term droughts by relying on soil moisture balance [
12]. The SRI, on the other hand, utilizes streamflow data to determine the severity and duration of hydrological drought [
13]. Although SPI and SPEI offer valuable insights into short- and medium-term drought processes, the PDSI stands out as a more comprehensive tool because it integrates multiple climatic and hydrological variables. The PDSI is not only based on precipitation but also relies on the components of precipitation, temperature, potential evapotranspiration, and soil moisture. In this respect, it offers a more holistic assessment than other indices. The onset, duration, and severity of drought events can be determined in detail by focusing on the soil water balance [
14,
15]. It stands out as a strong indicator in long-term drought analyses and trend examinations. Additionally, because of its capacity to evaluate both meteorological and agricultural droughts [
16], it is frequently preferred in studies on basin management, water resource planning, and modeling the impacts of climate change. Since drought is dependent on temperature and precipitation, PDSI is considered more suitable than other indices for assessing the potential impacts of climate change on future droughts [
12,
15].
The Palmer Drought Severity Index (PDSI) is widely recognized for its strong physical basis and comprehensive representation of soil–water balance processes [
8]. However, its operational implementation involves a multi-step water balance framework, several intermediate variables, and careful parameterization. Although improved versions such as the scPDSI have enhanced its climatological consistency [
17,
18], the practical application of the Palmer framework in large datasets, multi-station analyses, and real-time or predictive contexts may still require substantial preprocessing, computational organization, and methodological expertise. These aspects can pose challenges when the PDSI is used as part of data-intensive modeling workflows. Recognizing these challenges, this study proposes machine learning-based approaches to complement the traditional Palmer framework by providing a data-driven approximation of the PDSI. Rather than replacing the physically based formulation, the proposed framework is intended to support PDSI-oriented analyses by offering a flexible and interpretable modeling alternative that can be integrated into data-intensive drought studies. In this sense, the focus is placed on methodological robustness and predictive reliability rather than on explicit reductions in computational burden. The proposed approach aims to facilitate the analysis of large, multi-station hydroclimatic datasets and to improve the practical usability of PDSI-based drought indicators in predictive applications without altering the underlying physical assumptions of the Palmer methodology.
The increasing impacts of global climate change have made the development of new, practical, and reliable approaches for evaluating drought conditions an urgent necessity [
19,
20]. In this context, data-driven methods such as artificial neural networks (ANN), support vector machines (SVM), linear regression (LR), and decision trees (DT) have been intensively investigated for monitoring, assessing, and predicting droughts [
21,
22]. Data-driven models have become increasingly common and effective tools for drought prediction owing to their ability to handle the nonlinear processes encountered in the calculation of indices such as the Palmer Drought Severity Index (PDSI) [
22,
23].
In this study, various machine learning algorithms were applied to explore alternative modeling approaches that can better capture and predict the nonlinear structure of the PDSI [
24]. In addition, this study presents a modeling framework that employs data-driven approaches to efficiently approximate drought conditions represented by the Palmer drought family. In this context, one-month lead-time prediction refers to forecasting the Palmer Z-Index for the subsequent month, denoted as
Z(
t + 1), which constitutes the primary prediction target of the proposed framework. Based on long-term monthly hydrometeorological records, the proposed approach aims not only to reproduce current drought conditions but also to provide short-term predictive capability. In this context, the framework enables the evaluation of evolving drought conditions based on observed meteorological variability, supporting timely drought monitoring and early warning. Thus, the developed models contribute not only to monitoring past and current situations but also to short-term forecasting of drought dynamics, presenting significant potential for strengthening decision-support mechanisms in water resources planning [
25,
26,
27]. Focusing on Çanakkale (1940–2024), we integrated seasonal encodings, lagged variables, and rolling aggregates with machine learning models to capture both short-term anomalies and multi-scale persistence in drought dynamics. This approach is designed not only to improve computational efficiency, but also to enhance the operational applicability of the PDSI for timely decision-making. This approach is intended to complement traditional, computation-heavy PDSI workflows, thereby supporting timely monitoring, early warning, and climate-risk-aware water resource planning. The analysis utilizes long-term data from the Çanakkale Central and Biga meteorological stations, with the latter located near the Bakacak Dam catchment, thereby improving the spatial representativeness and robustness of regional drought characterization. By combining multi-station meteorological data with dam–catchment interactions, the proposed methodology provides a more reliable assessment of drought conditions, particularly in agriculturally managed regions.
2. Materials and Methods
2.1. Case Study
The province of Çanakkale is located in the northwest of Türkiye, between 39°27′–40°45′ north latitudes and 25°40′–27°30′ east longitudes, with an area of approximately 9933 km
2. The region lies within the Marmara transitional climate zone and is characterized by rainy winters and dry summers under the combined influence of the Mediterranean and Black Sea climate systems [
28]. The topography of Çanakkale is heterogeneous. Elevation begins with the Kaz Mountains in the north (highest point: 1767 m) and gradually decreases toward the low plains, deltas, and coastal flats along the Aegean and Marmara Sea shorelines. This topographic diversity leads to notable differences in the spatial distribution of precipitation, evaporation, and hydrological responses. Thus, drought analyses in this region must consider both temporal and spatial variability. From a hydrological perspective, the province of Çanakkale has a complex water system comprising wetlands, streams, and dam reservoirs. The main streams are Karamenderes, Sarıçay, Tuzla, Umurbey and Kocabaş. The Atikhisar, Bakacak, Bayramdere, and Çokal dams are strategically important for drinking water supply, irrigation, and flood control. In particular, the Atikhisar Dam, with a storage capacity of 54 hm
3 and an irrigation area of 3069 ha, provides most of the drinking water for Çanakkale. The Bakacak Dam, located on the Biga Plain, has a total storage capacity of approximately 136 hm
3 and is used for the irrigation of approximately 9000 ha of agricultural land [
28,
29]. Çanakkale province hosts an extensive network of wetlands that are inhabited by more than 317 bird species. The Kavak Delta, Suvla (Tuz) Lakes, Gökçeada Lagoon, Biga Stream, Çardak Lagoon, Sarıçay Delta, Umurbey Lagoon, and Kumkale Marshes are among the region’s most important wetland systems (
Figure 1). These areas help regulate water levels, support ecological balance, and serve as natural buffers against hydrological changes [
28].
Çanakkale is a sensitive region for drought analyses owing to its topographic diversity, numerous dams and wetlands, irrigation-based agricultural activities, and variable climate conditions. Long-term analyses of drought conditions, especially for the years 1997, 2009, and 2020, showed that severe droughts occurred, whereas moderate-to-severe drought events during the 2017–2018 period had a marked negative impact on reservoir levels and irrigation efficiency [
29,
30]. Evaluations based on the Standardized Precipitation Index (SPI) reveal that irregularities in the precipitation regime directly affect both agricultural production and water resource management.
In this study, long-term data from the Çanakkale Central Meteorological Station and Biga Meteorological Station were used to examine the region’s drought dynamics more representatively. The main rationale for including the Biga Meteorological Station in the analysis was its spatial proximity to the Bakacak Dam Basin and the critical role of this dam in agricultural irrigation activities in the region. This approach aims to reduce the limitations of regional generalizations made using data from a single station and strengthen the representativeness of hydro-meteorological variables in different sub-basins. The use of multiple meteorological stations in this study and the consideration of dam–basin relationships contribute to making the findings more representative and interpretable at the regional scale.
2.2. Data
The meteorological data used in this study were obtained from the General Directorate of State Meteorological Services (TSMS) in Türkiye. Long-term monthly data from two meteorological stations were utilized in the analyses: Çanakkale Central Meteorological Station (ID: 17112) and Biga Meteorological Station (ID: 18084) (
Figure 1).
The dataset from the Çanakkale Central Meteorological Station covers the period from January 1940 to December 2024, representing 85 years of uninterrupted monthly data. The data from the Biga Meteorological Station covered the period from January 1984 to December 2024, comprising a 41-year monthly observation series.
The meteorological parameters used for both stations included monthly total precipitation (mm), monthly average temperature (°C), relative humidity (%), atmospheric pressure (hPa), and wind speed (m/s). Additionally, the available soil water capacity (AWC, mm), which is required for Palmer-based drought calculations, was incorporated into the model. The predominant soil type in the Çanakkale and Biga regions is clay–loam, which is characterized by its high moisture retention capacity. Accordingly, taking as a basis a commonly accepted value in the literature for fine-textured soils, the AWC was fixed at 100 mm [
31].
In this study, we adopted the Palmer drought framework and calculated the Palmer Z-Index using monthly water balance components. The main reason for selecting the Z-Index is its rapid response to short-term monthly moisture anomalies and its direct relationship with hydro-meteorological variables. Within this scope, model inputs were composed of the relevant month’s hydro-meteorological variables and Z(t) values; machine learning models were structured to predict the Palmer Z-Index value of the following month, Z(t + 1).
All datasets were harmonized into monthly time steps, and missing data, outliers, and temporal consistency were subjected to quality control. The resulting dataset enables a joint assessment of hydro-climatic variability in the province of Çanakkale and the Biga sub-basin and forms the basis for subsequent Palmer-based drought computations and the machine learning modeling framework.
2.3. Computation of Palmer Drought İndices
In this study, the Palmer drought framework was used as a basis, and the Palmer Z-Index was calculated using monthly water balance components. The Palmer approach is based on a water balance accounting system that considers the deviation of actual precipitation from climatically appropriate precipitation [
32,
33]. In this system, total precipitation (
P) and potential evapotranspiration (
PE) are the main climatic inputs, whereas the available soil water capacity (AWC) represents the effective moisture-holding capacity of the soil.
Because the predominant soil texture around Çanakkale and Biga is clay–loam, the AWC was fixed at 100 mm, based on a value widely used in the literature for fine-textured soils [
31,
32]. In the Palmer method, potential evapotranspiration (
PE) was calculated using the temperature-based empirical formulation proposed by Thornthwaite [
34], as it is suitable for situations where radiation data are limited and long-term temperature data are available.
Here,
T represents the monthly average temperature (°C), I is the annual heat index, a is the empirical exponential value derived from the index, and
C is the latitude-daylength correction factor. In the Palmer algorithm, the soil profile is conceptualized as a two-layer structure (surface and subsoil), and the water storage in each layer is updated based on monthly precipitation, recharge, surface runoff, and evapotranspiration processes [
35]. Within this framework, for each month, the components of recharge (PR), surface runoff (RO), loss (L), and actual evapotranspiration (ET) were calculated; using these terms, climatically appropriate precipitation (
P_CAFEC), which is needed to sustain normal soil moisture under local climate conditions, was estimated [
36]. The difference between the observed precipitation and this value is defined as the monthly moisture anomaly (
d):
Monthly moisture anomalies were standardized using the climatic weighting factor (K), which considers regional precipitation variability, to obtain the Palmer Z-Index. The Z-Index is an anomaly indicator that reflects short-term (monthly) moisture surplus and deficit conditions and was preferred in this study because its structure can be directly linked to hydro-meteorological variables.
In this study, only the Palmer Z-Index was used in the machine learning models. The model inputs were composed of hydro-meteorological variables and Z(t) values for the relevant month, and the models were structured to predict the Palmer Z-Index value for the following month, Z(t + 1).
All Palmer-based calculations were performed in the R software environment (R version 4.0.4, 2021) using the scPDSI package (version 0.1.3) published on CRAN. The package is based on the algorithmic framework developed by [
32,
33], which allows the calculation of Palmer components from monthly precipitation and potential evapotranspiration data. Data processing, temporal alignment, and quality control steps were performed using customized R scripts.
2.4. Feature Engineering
In this study, feature engineering was designed to represent the temporal continuity, seasonality, and hydroclimatic memory characteristics of monthly hydrometeorological variables, and the problem was addressed within the framework of a one-month-ahead forecast
Z(
t + 1). All features were generated using only the information available up to time t. The feature space was created in five groups: basic meteorological variables, lagged features, moving window statistics, seasonality encodings, and forward-shifted target definitions. The structures of the feature sets used in the different modeling pipelines are summarized in
Table 1.
Monthly total precipitation, average temperature, relative humidity, atmospheric pressure, and wind speed were used as primary inputs. Lags of 1, 3, 6, and 12 months were generated in all pipelines; additionally, a 2-month lag was included within the multi-model framework to reinforce the short-term continuity (
Table 1). To represent cumulative hydroclimatic effects, moving window statistics of 3, 6, and 12 months were derived; in the tree-based pipeline with hyperparameter tuning, rolling features were calculated by applying shift (1) to prevent information leakage (
Table 1). Seasonality was modeled using sine and cosine transformations of the month information (month_sin, month_cos); in the multi-model pipeline, the calendar year was also used to represent long-term variability (
Table 1). In all pipelines, the target variable was defined as
Z(
t) →
Z(
t + 1). Scaling was applied only for models sensitive to scale (linear regression, Elastic Net, SVR) using StandardScaler implemented in the scikit-learn library (version 1.6.1); for tree-based methods, no scaling was performed (
Table 1).
2.5. Model Selection
In this study, eight regression models—covering linear, regularized linear, and nonlinear tree-based ensemble methods—were selected to represent the relationships between meteorological variables and the Palmer Z-Index using different modeling approaches. Model selection was based on methods commonly used in the literature for modeling drought and hydroclimatic time series, as well as preliminary analyses.
The selected models were linear regression, Elastic Net, Support Vector Regression (SVR), Random Forest, Gradient Boosting Regressor (GBR), Extreme Gradient Boosting (XGBoost), CatBoost, and LightGBM. This set of models represents a wide range of approaches, from simple linear assumptions to advanced ensemble methods, that can capture complex nonlinear relationships.
2.5.1. Linear Regression (Ordinary Least Squares)
Linear regression is a fundamental supervised learning method that models the relationship between a dependent variable and one or more independent variables in a linear framework. The model optimizes its parameters using the Ordinary Least Squares (OLS) method, which minimizes the squared differences between the actual and predicted values.
The main advantages of linear regression are its simplicity and direct interpretability of model coefficients. However, its performance relies on the assumption of linear relationships and may be limited in the presence of complex or nonlinear patterns. In this study, linear regression was used to assess linear relationships between meteorological variables and the Palmer Z-Index and to serve as a baseline for comparison with more complex models [
37,
38,
39].
2.5.2. Support Vector Regression (SVR)
Support Vector Regression (SVR) is a powerful regression method that can capture both linear and nonlinear relationships. By defining a specific error tolerance (ε), the SVR penalizes deviations outside this margin, thereby limiting overfitting. This structure provides an advantage, especially for generating generalizable predictions in noisy datasets [
40,
41,
42].
SVR can model nonlinear relationships using kernel functions and demonstrate effective performance in high-dimensional feature spaces. In this study, SVR was preferred to capture the nonlinear patterns between meteorological variables and the Palmer Z-Index.
2.5.3. Elastic Net Regression
Elastic Net is a linear regression method that combines the Ridge and Lasso regularization approaches. This approach increases model stability while reducing the risk of overfitting, especially in datasets containing many highly correlated variables [
12].
The Elastic Net establishes a more streamlined model structure by suppressing unnecessary variables and balancing the negative effects of multicollinearity among variables. In this study, we used the Elastic Net to more stably assess the relative importance of meteorological variables and to improve the generalization capability of linear models.
2.5.4. Random Forest
Random Forest is an ensemble learning method in which a large number of decision trees are trained on random subsamples and feature subsets and then combined. This approach offers lower variance and higher generalization performance than individual decision trees. Random Forest is widely used in hydroclimatic problems because of its ability to capture nonlinear relationships, relative robustness to outliers, and automatic modeling of interactions between features. In this study, the Random Forest (RF) method was evaluated as a reference tree-based method for modeling nonlinear structures [
43,
44].
2.5.5. Gradient Boosting Regressor (GBR)
Gradient Boosting is an ensemble learning method in which weak learners (usually shallow decision trees) are trained sequentially, with each new model focusing on learning the residuals of the previous models. This approach ensures high accuracy by gradually reducing the prediction error [
45,
46,
47]. In this study, hyperparameters such as the learning rate, tree depth, number of trees, and subsampling ratio for GBR were tuned using 5-fold time series cross-validation and the grid search method. The resulting configuration provided a balanced performance between bias and variance.
2.5.6. Extreme Gradient Boosting (XGBoost)
XGBoost is an optimized and regularized version of the Gradient Boosting algorithm. Owing to the inclusion of both L1 and L2 regularization terms, it can effectively control model complexity and reduce the risk of overfitting [
25,
26,
27]. XGBoost was evaluated in this study because of its computational efficiency, parallel processing capability, and capacity to model complex nonlinear relationships.
2.5.7. CatBoost Regressor
CatBoost is a gradient boosting algorithm that was specifically developed for the effective handling of categorical variables. However, it can also deliver strong performance on datasets composed of continuous variables owing to its symmetric tree structure and ordered learning approach [
48,
49,
50]. In this study, CatBoost was comparatively evaluated among tree-based ensemble methods because of its relatively low sensitivity to hyperparameters and its robust structure against overfitting.
2.5.8. LightGBM Regressor
LightGBM is a gradient boosting method that offers high computational efficiency through a histogram-based splitting strategy and leaf-wise tree growth approach. It is characterized by fast training times and strong prediction performance on large datasets [
51,
52,
53]. In this study, LightGBM was evaluated among tree-based methods because of its rapid modeling requirements and high potential for accuracy.
2.6. Model Training and Evaluation
In this study, a five-fold time series cross-validation approach was used for the training and validation of the models. The TimeSeriesSplit method in the scikit-learn library allows training with earlier period data and validation in subsequent periods while preserving the chronological order of data. This structure prevents information leakage, which is critical in time series problems [
54,
55].
To evaluate the applicability of the TimeSeriesSplit approach, an Augmented Dickey–Fuller (ADF) stationarity test was conducted for the target variable, the Palmer Z-Index time series. The test results showed that the Z-Index series does not contain a strong deterministic trend and exhibits largely stationary behavior at the monthly scale. This finding supports the methodological suitability of the time-based split cross-validation approach, which predicts future periods using past values of the series. Accordingly, a time series split-based training and validation strategy was preferred in this study.
In each fold, approximately 80% of the dataset was used for training, and 20% was used for validation. While the training set expands over time with each fold, the validation set is constructed to cover the immediately following time segment. This approach aims to evaluate the models’ ability to generate future forecasts using historical data.
Out-of-Fold (OOF) predictions were obtained for each model. In this setup, each observation in the validation set was predicted using a training set that contained only the preceding time steps. By combining the OOF predictions generated across all folds, an unbiased and realistic generalization performance was achieved for the entire dataset [
56,
57,
58]. All machine learning analyses, feature engineering procedures, model training, evaluation processes, and visualization tasks were conducted using the Python programming language on the Google Colaboratory platform (Google LLC, Mountain View, CA, USA;
https://colab.research.google.com, accessed on 3 February 2026). The computational environment was based on Python version 3.12.12 and included the following libraries: NumPy (version 2.0.2), pandas (version 2.2.2), scikit-learn (version 1.6.1), XGBoost (version 3.1.3), LightGBM (version 4.6.0), and SHAP. These tools were used for data preprocessing, model development, interpretability analysis, and result visualization.
Evaluation Criteria
Three commonly used and complementary performance metrics were employed to assess the predictive performance of the models: the Coefficient of Determination (R2), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE).
The Coefficient of Determination (R2) indicates how much of the total variance in the target variable is explained by the model, with higher values implying better explanatory power.
The Root Mean Square Error (RMSE) is a metric based on the square of the prediction errors and gives greater weight to larger errors, expressing the magnitude of the error in the same units as the target variable.
The Mean Absolute Error (MAE) represents the average of the absolute values of the prediction errors. Compared to RMSE, it is less sensitive to outliers and offers a metric that is easier to interpret.
These metrics were calculated separately for each fold and then averaged to represent the overall performance of the model. In addition to the mean values of R
2, RMSE, and MAE, model comparisons were conducted by considering the standard deviation of these metrics across folds to evaluate performance stability over time. Additionally, to contextually assess the hydrological relevance of the Palmer Z-Index, monthly storage volume data from the Atikhisar and Bakacak dams were utilized. These data were not included as inputs for the machine learning models. From the dam data, a three-month Standardized Reservoir Index (SRI
3), based on monthly volume anomalies and representing short-term anomalies in reservoir storage, was derived. The relationship between the Z-Index and SRI
3 was evaluated—not for quantitative modeling purposes—but to examine the temporal alignment of drought events. This analysis does not aim to validate the main model results but rather serves as a supportive assessment aimed at interpreting the hydrological consistency of the Z-Index in context [
59,
60].
2.7. Model Interpretation and Advanced Validation Framework
In this study, we recognized that an approach based solely on classical regression performance metrics is insufficient for evaluating a phenomenon such as drought, which is threshold-based and related to physical processes. In this context, an integrated assessment framework was adopted that collectively addresses model interpretability, event detection skill within different drought severity classes, and consistency with hydroclimatic processes [
61]. Within this framework, the contributions of input variables to the model were investigated using explainable machine learning methods, category-based validation metrics for drought events were calculated, and the temporal consistency between Palmer Z-Index predictions and hydrological storage anomalies was evaluated. In this way, the model performance was comprehensively analyzed not only in terms of statistical accuracy but also considering physical meaningfulness and decision support.
2.7.1. Model Interpretability Using Shapley Additive Explanations (SHAP)
In this study, the Shapley Additive Explanations (SHAP) approach, based on game theory, was used to render the decision mechanisms of machine learning models interpretable. SHAP quantitatively expresses the marginal contribution of each input variable to the model output using Shapley values calculated for all possible feature combinations [
62,
63]. This method, particularly for tree-based ensemble models with nonlinear decision structures, enables a consistent evaluation of the relative importance of the variables.
SHAP values were evaluated in the original output space of the models, preserving the physical interpretability of the direction and magnitude of the variable contributions.
2.7.2. Assessing the Hydroclimatic Consistency of Z-Index with Reservoir Storage Anomalies
To assess the hydroclimatic response lag between the Palmer Z-Index and reservoir storage anomalies, an event-based lead–lag analysis was performed between the Z-Index and the Standardized Reservoir Index derived at different time scales (SRI
1, SRI
3, and SRI
6) and representing short- and medium-term anomalies in reservoir storage volumes. The Z-Index was used as the meteorological drought signal, and SRI series derived from the monthly storage volume data of the Atikhisar and Bakacak reservoirs for the period 2004–2024 were used as the hydrological response indicator (
Figure 2).
The Z-Index and SRI series were combined on a common time axis, and drought events were defined as the event start—the time step when the threshold values defined for both series (
Z ≤ −1 and
SRI ≤ −1) were first exceeded. Consecutive periods of continuous negative values were treated as a single event, and comparisons were made based on the event onset [
64,
65].
For each Z-Index event, it was investigated whether an SRI event occurred within a maximum 12-month search window in the subsequent months; the difference between the onset of a Z-Index event and the matched SRI event was defined as the hydrological response delay (lead time). Events occurring within the same month were included in the analysis to evaluate the probability of simultaneous response [
66,
67].
To determine whether the observed event matches were coincidental, a Monte Carlo-based permutation approach that preserves the distributional characteristics of the time series was applied to the data. In this framework, for each SRI time scale, descriptive statistics for the number of matching events, match rate, and delay duration were calculated; the analysis was handled as a model-independent and complementary hydroclimatic validation method to examine at which time scales and with what lags the meteorological drought signal of the Z-Index is reflected in the reservoir storage dynamics.
2.7.3. Evaluation of Drought Detection Skill Across Severity Categories
To assess the ability of the Z-Index predictions to distinguish between different drought severity categories and detect event onsets, an event-based and category-specific validation approach was applied. Drought events were defined based on the thresholds of
Z ≤ −1 and
Z ≤ −2, and event onsets were determined using an event-merging approach that combined short interruptions (maximum of 1 month). Event validation was conducted by comparing the onset times of observed and predicted drought events; for each observed event, we examined whether a prediction occurred within a maximum search window of 12 months. If the prediction occurred before the observed event, the difference was defined as the lead time (early warning period); simultaneous events were also included in the analysis. Within this framework, measures such as Probability of Detection (POD), False Alarm Ratio (FAR), and Critical Success Index (CSI) were calculated to evaluate the event detection skills of the models across different drought severity classes. In addition, at the event level, the representation of peak timing and severity characteristics was comparatively examined between the predicted and observed events [
68,
69,
70].
3. Results and Discussion
In this study, eight machine learning models, including linear, regularized, and tree-based ensemble approaches, were comparatively evaluated using time series-preserving five-fold cross-validation with out-of-fold predictions. The R2, RMSE, and MAE metrics were used as evaluation criteria. This approach is not only limited to assessing the consistency between observational Z-Index values for the current month calculated using the Palmer method and the Z(t) values resulting from model predictions for the same period; it also aims to examine, within a holistic framework, the models’ generalization capacity and forward-looking predictive performance for estimates of the Palmer Z-Index Z(t + 1) for the following month.
3.1. Results of the Augmented Dickey–Fuller Test
Augmented Dickey–Fuller (ADF) test results indicate that the Z-Index series are stationary for both Çanakkale and Biga stations. The null hypothesis of a unit root is strongly rejected for the original series (Çanakkale: ADF = −30.674,
p < 0.001; Biga: ADF = −20.832,
p < 0.001), with test statistics exceeding all critical thresholds in magnitude. Consistently, the first-differenced series (ΔZ) are also stationary for both stations (
p < 0.001). These findings suggest that the statistical properties of the Z-Index remain stable over time, reducing the risk of spurious modeling. Therefore, employing a time series cross-validation strategy (e.g., 5-fold time series split), where models are trained on past observations and tested on future observations, is methodologically appropriate and consistent with real-world forecasting settings (
Table 2).
3.2. Comparative Evaluation of the Prediction Performance of Machine Learning Models for the Palmer Z-Index
The results indicate that increased model flexibility and the ability to capture nonlinear structures substantially enhance predictive accuracy and temporal generalization. Tree-based ensemble methods outperform linear and regularized models in representing complex, multi-scale hydroclimatic interactions underlying monthly drought anomalies.
Table 3 summarizes the OOF performance results for the current month
Z(
t) and next month
Z(
t + 1) predictions of the Palmer Z-Index at the Çanakkale Merkez (17112) and Biga (18084) meteorological stations. In general, significant performance differences were observed among the models at both stations; it was seen that these differences are sensitive to both the target time step and the hydroclimatic characteristics of the station (
Figure 3).
For the Çanakkale Central station, Gradient Boosting produced the highest OOF R2 value (0.841) in Z(t) predictions and achieved the lowest error metrics (RMSE = 0.731, MAE = 0.451). This finding indicates that the nonlinear structure of the Z-Index in Çanakkale Central can be more effectively represented by ensemble methods. A similar pattern was observed in Z(t + 1) predictions as well; Gradient Boosting provided the highest explanatory power with an OOF R2 of 0.828.
However, the model behavior at the Biga station was somewhat different. For Z(t) predictions, Elastic Net achieved the highest OOF R2 value (0.806), followed by Gradient Boosting, XGBoost, and CatBoost, with only slight differences in performance. This suggests that in Biga, the relationship between meteorological variables and the Z-Index may contain more regular and linear components. For Z(t + 1) predictions, Elastic Net, Gradient Boosting, and linear regression models produced results that were quite close to each other, and the advantage of ensemble methods was more limited compared to Çanakkale Central. This difference can be explained by Biga’s shorter observation period (1984–2024) and the relatively low complexity of the local hydroclimatic dynamics.
The findings strongly align with the literature, emphasizing that nonlinear methods are more successful than traditional linear models in modeling drought indices. In Türkiye, ANN has been reported to provide higher accuracy (R ≈ 0.98) than linear regression, SVM, and decision trees in Z-Index prediction; likewise, the superior performance of wavelet–fuzzy hybrid models has been demonstrated in Northwestern Türkiye [
18]. Similar results have also been observed internationally: the use of XGBoost combined with signal decomposition nearly achieved Nash–Sutcliffe efficiencies of 0.98 in short-term scPDSI predictions in semi-arid regions, and furthermore, ensemble tree-based methods exhibited a more robust performance than deep learning under data-limited conditions [
71,
72].
In this study, the superiority of Gradient Boosting can be explained by its ability to capture high-order and nonlinear dependencies between meteorological inputs and drought response by iteratively reducing errors. The observed weak pairwise correlations support the notion that purely additive linear models are limited in their ability to represent the complex structure of the Z-Index. In contrast, ensemble tree models reduce bias by adaptively partitioning the feature space and increasing the generalization power.
Methodologically, time series cross-validation preserves temporal dependencies to prevent information leakage and provides more reliable generalization metrics. Although some studies report higher R
2 values using a single train–test split [
18], the stricter validation scheme followed in this study offers a more realistic performance assessment. A low variance across folds indicates that the model remained stable during different periods. Moreover, lagged and moving window-based features increased the predictive power by capturing continuity effects, which is consistent with the literature [
71]. In terms of application, model outputs can be used to improve irrigation allocation plans and strengthen early warning systems, thereby supporting the decision-making potential of ensemble-based approaches in drought management [
73].
3.3. Explaining the Decision Mechanisms of Models with SHAP
The SHAP findings obtained for the Çanakkale Central and Biga meteorological stations revealed that the model output at both stations was predominantly determined by precipitation variables.
In both the current time step and one-step-ahead predictions, both the instantaneous and lagged precipitation components produced the highest SHAP contributions, which clearly demonstrated that the target variable exhibited a strong dependence on hydrological processes. The main difference between the stations was the relative importance of seasonality and temperature components. The prominence of the month_sin and month_cos variables at the Biga station indicates that the model output is sensitive to a distinct annual cycle, whereas the seasonality effect was more limited at the Çanakkale Central station, where the model responded more to short-term meteorological conditions. Temperature variables played a secondary role at both stations; however, delayed effects were more prominent at Çanakkale Central, whereas instantaneous and short-term lagged effects were prominent at Biga (
Figure 4). According to the SHAP analysis, the highest feature contributions were from meteorological inputs, especially variables related to precipitation and soil moisture. This finding aligns with previous studies that revealed that variables directly tied to the water budget play a dominant role in drought prediction. Indeed, Ref. [
74], using TerraClimate data for PDSI prediction, reported that soil moisture and precipitation variables were the most influential inputs in the model output. Similarly, Ref. [
75] showed that, in an XGBoost-based hydrological drought (streamflow classification) prediction across China, SPI—a precipitation-based indicator—had the highest SHAP scores, and that this effect was further enhanced by soil moisture and potential evapotranspiration variables depending on seasonal conditions. Additionally, in their work addressing groundwater drought with explainable artificial intelligence (XAI) and SHAP analysis, Ref. [
76] demonstrated that the duration of precipitation-deficit-driven meteorological drought and the intensity of temperature-induced meteorological drought play critical roles. The SHAP distributions obtained in this study also clearly show that precipitation deficiency is a decisive factor in the decline of the Palmer Z-Index.
3.4. Analysis of the Relationship Between Palmer Z-Index Estimates and SRI
The temporal relationship between the Palmer Z-Index and SRI
3, a mid-term hydrological drought indicator, was examined to evaluate the transfer of drought signals from the meteorological to the hydrological stage. The results presented in
Table 4 show that the Palmer Z-Index can consistently predict SRI
3-based hydrological drought conditions months in advance in both basins. In the Merkez–Atikhisar Basin, the average and median lead times for the
Z– SRI3 relationship were calculated as 5.19 and 5 months, respectively. The fact that the advance capture rate reached 81.3% indicates that the Palmer Z-Index can provide a significant early warning of the development of mid-term hydrological droughts. In the Biga–Bakacak Basin, the average lead time reached 6.39 months, and the advance capture rate was determined to be 95.7%, indicating that meteorological drought signals are more distinctly reflected in the hydrological system with a greater delay.
These results reveal that, owing to the structure of the Palmer Z-Index based on the meteorological water balance, it systematically provides an early signal for medium-term hydrological drought processes represented by SRI3. Therefore, it can be concluded that the Palmer Z-Index values predicted by machine learning can be used as an effective indicator in operational early warning systems for the early detection and monitoring of medium-term hydrological droughts. The time-lagged relationship between the Z-Index and SRI supports the notion that the meteorological drought signal is the primary triggering mechanism for initiating hydrological droughts.
The observed lag findings are consistent with the literature. It is widely known that hydrological drought emerges a few months after meteorological drought; Ref. [
7] emphasized this process within a cause-and-effect framework, showing this in the context of Türkiye. Ref. [
77] also found high correlations between meteorological (SPI/SPEI) and hydrological (SRI) indices in a similar basin and showed that indices with the same time scale, in particular, produced stronger relationships. In this study, the highest Z–SRI relationship corresponded to a few months after the meteorological drought signal. However, it should be noted that the magnitude of the lag may vary depending on regional climate conditions and soil–water relationships, as supported by [
78], who drew attention to the process of meteorological deficits manifesting in streamflow in the Rio Godavari Basin. However, studies directly addressing the relationship between indicators representing short-term moisture anomalies, such as the Palmer Z-Index, and reservoir-based hydrological drought indices, such as the SRI, are relatively limited. However, Ref. [
79] found that the Palmer Z-Index strongly reflects short-term soil moisture anomalies and exhibits statistically significant, albeit delayed, relationships with reservoir-based drought indicators. This suggests that the Z-Index should be considered a precursor indicator representing the early stages of stress on the hydrological system rather than a direct descriptor of hydrological drought. The case of Brazil is noteworthy in this context; studies conducted for the Jucazinho Reservoir reported that indices such as SPI, SPEI, and SRI were insufficient to fully capture fluctuations in the reservoir water level [
80]. When these findings are evaluated together, it is evident that although the lag structures of the Z-Index’s precursory relationship to hydrological droughts may vary in each basin, it provides a strong early warning link between meteorological and hydrological systems and may play an important role, especially in short-term drought monitoring and forecasting studies.
3.5. Event-Based Drought Analysis: The Ability of Models to Capture Drought Periods
The performance of machine learning-based Palmer Z-Index predictions in capturing drought events was evaluated through a combined analysis of time series comparisons (
Figure 5) and category-based validation metrics (
Table 5). The time series results presented in
Figure 5 indicate that both the current-month
Z(
t) and one-month-ahead
Z(
t + 1) predictions generally follow the temporal dynamics of the observed Palmer Z-Index with a high degree of consistency. In particular, during periods when the index dropped below the drought threshold (
Z ≤ −1), the predicted series were able to capture the timing of threshold crossings largely synchronously with observations, which is essential for identifying the onset of drought events. These visual findings are further supported by the quantitative validation results summarized in
Table 5, demonstrating that the proposed approach has operational relevance not only in terms of overall predictive accuracy but also when evaluated from a threshold-based, event-oriented perspective [
81].
Under mild drought conditions (Z ≤ −1), the current-month forecasts Z(t) exhibited strong detection skill at both stations. At the Çanakkale (Center) station, a POD value of 0.777 and a CSI value of 0.685 were obtained, while FAR remained low at 0.148, indicating a favorable balance between sensitivity and reliability. Similarly, at the Biga station, POD and CSI values of 0.740 and 0.655, respectively, were achieved. The close agreement of performance metrics between the two stations suggests that the proposed framework yields consistent event-detection capability across different locations, supporting its potential spatial generalizability. These results highlight that, for drought monitoring purposes, models must not only exhibit high sensitivity (high POD) but must also maintain controlled false alarm rates (low FAR) to ensure operational usefulness.
For the one-month-ahead forecasts Z(t + 1), a moderate decrease in POD and CSI values is observed for mild droughts; however, the increase in CSI to 0.669 at the Biga station indicates that short-term forecasts remain effective in tracking the persistence and evolution of ongoing drought conditions. This finding suggests that the model is capable of representing not only the initiation of drought events but also their short-term continuation, which is particularly relevant for monitoring applications.
In the severe drought category (
Z ≤ −2), the performance metrics exhibit greater variability, primarily reflecting the relatively low frequency of such extreme events. Nevertheless, the
Z(
t + 1) forecasts consistently outperform
Z(
t) in terms of POD and CSI, underscoring the added value of short-term prediction for early warning. At the Çanakkale station, CSI increased from 0.493 for
Z(
t) to 0.536 for
Z(
t + 1), while at the Biga station, POD increased from 0.600 to 0.680, demonstrating that one-month-ahead forecasts can effectively capture the development of severe drought conditions. However, the elevated FAR values observed for severe droughts, particularly at the Biga station (FAR = 0.452 for
Z(
t + 1)), indicate an unavoidable trade-off between early warning capability and forecast reliability when predicting rare, high-impact events. This behavior is consistent with the drought literature, which emphasizes that detection skill for extreme events must be evaluated jointly with the cost of false alarms [
82].
The combined evaluation of time series-based visual analysis (
Figure 5) and category-based validation metrics (
Table 5) demonstrates that the proposed modeling framework provides robust performance for mild drought monitoring and meaningful early warning potential for severe drought conditions. Forecasting the Palmer Z-Index for both the current and subsequent months using machine learning thus emerges as an effective tool for detecting both the onset and short-term evolution of drought events. The contribution of this study lies not only in modeling the Palmer Z-Index as a continuous time series but also in operationalizing it through an event-based verification framework based on POD–FAR–CSI metrics. While event-based approaches in drought research have traditionally been applied to frequency–duration–severity analyses using meteorological indices [
83], machine learning studies have more often focused on forward classification of drought categories. For instance, DroughtCast has demonstrated skill in predicting USDM categories at lead times of 1–12 weeks [
84], and [
85] reported high F1 scores by framing threshold-defined drought events as a binary classification problem. In contrast, the present study integrates continuous index-based forecasting with event-based evaluation within a unified framework, enabling simultaneous real-time monitoring via
Z(
t) and short-term early warning via
Z(
t + 1) at two meteorological stations. The strong consistency between the time series behavior (
Figure 5) and the event-based validation metrics (
Table 5) therefore reinforces the applicability of the proposed approach for operational drought monitoring and early warning systems.
4. Conclusions
In this study, we present a machine learning-based integrated framework for short-term meteorological drought prediction using the Palmer Z-Index. This study is based on long-term monthly data from two meteorological stations (Çanakkale Central and Biga) in northwestern Türkiye, which have different hydroclimatic characteristics. Current month Z(t) and one-month-ahead Z(t + 1) drought predictions were evaluated using validation strategies that preserved the time series structure.
The findings indicate that, owing to the Palmer Z-Index’s high sensitivity to short-term moisture anomalies, drought can be predicted with high accuracy when appropriate feature engineering and time-aware validation approaches are employed. Across all stations and target time steps, tree-based ensemble models, particularly Gradient Boosting, XGBoost, and CatBoost, offered higher explanatory power and lower error values than linear and regularized linear models. This result demonstrates that short-term drought dynamics are determined by nonlinear interactions and multi-scale hydroclimatic continuity processes, and that these structures cannot be fully represented by linear models.
Among the models, the Gradient Boosting algorithm exhibited the highest and most consistent generalization performance at the Çanakkale Central station—where hydroclimatic variability and nonlinear interactions are more pronounced—for Z(t + 1) (OOF R2 ≈ 0.83). In contrast, at the Biga station, which has a shorter data period and a relatively more regular hydroclimatic structure, the fact that the Elastic Net and linear regression models performed competitively with ensemble methods demonstrates that model performance depends not only on algorithmic complexity but is also strongly tied to the station-specific data structure and climatic dynamics. This finding highlights the methodological necessity of a comparative and multi-model approach, rather than relying on a single universal model for drought prediction.
In this study, an advanced validation framework that prioritizes physical consistency was applied, going beyond classical regression performance metrics. The SHAP analysis results indicated that at both stations, the main determinants of the model outputs were precipitation and its lagged components, whereas temperature and seasonality variables contributed at a secondary level, and variables such as relative humidity, wind speed, and atmospheric pressure played more limited yet complementary roles. These findings reveal that machine learning models, through the Palmer water balance approach, develop physically consistent decision-making mechanisms.
The hydrological relevance of the Palmer Z-Index values predicted by machine learning was further evaluated using reservoir storage anomalies from the Atikhisar and Bakacak dams. Event-based time lag analyses showed that the Z-Index, as a meteorological drought signal, systematically led to medium-term hydrological drought conditions—represented by SRI3—by approximately 5–6 months in both basins. The fact that the lead time hit rates exceeded 80% in the Çanakkale Central Basin and 95% in the Biga Basin demonstrates the strong potential of Palmer Z-Index predictions based on machine learning for the early detection of hydrological droughts.
Category-based validation analyses support the proposed approach’s ability to distinguish between different drought severity levels and capture event onsets. For mild drought conditions (Z ≤ −1), high probabilities of detection (POD ≈ 0.74–0.78) were achieved for both current and forward predictions, whereas for rarer severe drought events (Z ≤ −2), forecasts one month ahead offered significant early warning capacity. The relatively high false alarm rates for severe events reflect the inevitable sensitivity–reliability trade-off in early warning systems.
In conclusion, this study demonstrates that machine learning-based nowcasting and short-term forecasting of the Palmer Z-Index can produce results that are not only statistically robust but also consistent with hydroclimatic processes and are operationally meaningful. The proposed framework, with its computational efficiency, interpretability, and adaptability to different hydroclimatic conditions, provides a powerful decision-support tool for irrigation planning, reservoir management, and drought early warning systems. In future studies, evaluating multi-step forecasting horizons, testing spatial generalization with gridded datasets, and integrating large-scale climate oscillations into the model will further enhance the predictive capability and operational value of this method.