Machine Learning-Based Water Level Forecast in a Dam Reservoir: A Case Study of Karaçomak Dam in the Kızılırmak Basin, Türkiye

Güneş Şen, Senem

doi:10.3390/su17188378

Open AccessArticle

Machine Learning-Based Water Level Forecast in a Dam Reservoir: A Case Study of Karaçomak Dam in the Kızılırmak Basin, Türkiye

by

Senem Güneş Şen

Department of Forest Engineering, Faculty of Forestry, Kastamonu University, Kastamonu 37150, Türkiye

Sustainability 2025, 17(18), 8378; https://doi.org/10.3390/su17188378

Submission received: 28 August 2025 / Revised: 15 September 2025 / Accepted: 16 September 2025 / Published: 18 September 2025

(This article belongs to the Special Issue Towards Sustainability: Applications of Machine Learning in Water Management and Environmental Monitoring)

Download

Browse Figures

Versions Notes

Abstract

Reliable dam reservoir operation is crucial for the sustainable management of water resources under climate change-induced uncertainties. This study evaluates four machine learning algorithms—linear regression, decision tree, random forest, and XGBoost—for forecasting daily water levels in a dam reservoir in the Western Black Sea Region of Türkiye. A dataset of 5964 daily hydro-meteorological observations spanning 17 years (2008–2024) was used, and model performances were assessed using MAE, RMSE, and R² metrics after hyperparameter optimization and cross-validation. The linear regression model showed weak predictive capability (R² = 0.574; RMSE = 2.898 hm³), while the decision tree model achieved good accuracy but limited generalization (R² = 0.983; RMSE = 0.590 hm³). In contrast, ensemble models delivered superior accuracy. Random forest produced balanced results (R² = 0.983; RMSE = 0.585 hm³; MAE = 0.046 hm³), while XGBoost achieved comparable accuracy (R² = 0.983) with a slightly lower RMSE (0.580 hm³). Statistical tests (p > 0.05) confirmed no significant differences between predicted and observed values. These findings demonstrate the reliability of ensemble learning methods for dam reservoir water level forecasting and suggest that random forest and XGBoost can be integrated into decision support systems to improve water allocation among agricultural, urban, and ecological demands.

Keywords:

dam reservoir; water level forecasting; sustainability; machine learning; random forest; XGBoost; water resource management; hydrology; Western Black Sea Region; Türkiye

1. Introduction

Dam reservoirs are elevated, open-air storage areas created by constructing dams that retain and regulate water flow [1]. Dams and dam reservoirs play a critical role in the infrastructure of modern societies due to their multifaceted functions, such as water resource management, hydroelectric power generation, irrigation, flood control, and industrial and urban water supply [2,3]. These structures are strategic water management systems that serve the public good. Increasing populations and environmental pressures, such as climate change and drought, necessitate more efficient and predictable operation of these structures [4]. The global scarcity of freshwater requires developing solutions to balance water supply and demand, especially locally [5]. In this context, accurately and timely estimating dam storage or reservoir water levels is vital for sustainable and effective management of water resources [6,7,8].

The main problem addressed in this study is the difficulty of reliably forecasting dam water levels under increasing climate variability and anthropogenic pressures. Ensuring the sustainable management of water resources requires models that can capture nonlinear relationships and provide accurate predictions using data commonly available at dam sites.

Urbanization, industrialization, land use changes resulting from agricultural production activities, and population growth are the main anthropogenic factors that significantly increase water scarcity [9,10]. These factors, together with the impacts of climate change, cause unpredictable fluctuations in surface and groundwater regimes and threaten the sustainable use of water resources [11,12]. Multi-purpose uses of dams, such as hydropower and water supply, and operational activities carried out outside of seasonal cycles, can significantly affect water quality and aquatic ecosystems through fluctuations in water levels [13]. Particularly in semi-arid regions, dam water level data are considered one of the fundamental inputs in flood forecasting studies [14,15]. Flood protection is one of the most important functions performed by dams [1]. Numerous studies have shown that the frequency of floods caused by heavy rainfall has increased in recent years [16,17]. Moreover, previous studies and scenario-based projections [18,19] suggest that the increasing flood frequency and severity trend will continue, with climate change identified as the primary driver of this trend [20,21,22].

Despite the importance of forecasting reservoir levels, traditional statistical and rule-based methods have limitations in capturing nonlinear and multidimensional dynamics. Traditional forecasting methods have been used for years to estimate dam water levels. However, these methods rely on limited tools such as operator-based rule curves, historical flow analyses, and linear mathematical approaches, which fail to adequately capture the complex and dynamic nature of the system [23]. The interdependencies among many components, including meteorological variables, streamflows, and evaporation rates, further constrain the performance of traditional models.

Machine learning (ML) and deep learning (DL) approaches have emerged as promising alternatives in recent decades. Recent studies have shown that ML methods, particularly ensemble models such as random forest (RF) and extreme gradient boosting (XGBoost), achieve significantly higher accuracy in forecasting reservoir water levels compared with traditional approaches and can effectively learn high-dimensional, nonlinear patterns [24,25]. Extensive reviews in the field of hydrology reveal that various ML techniques, ranging from artificial neural networks (ANN) and support vector machines (SVM) to deep learning architectures such as long short-term memory (LSTM) and convolutional neural networks (CNN), have been widely applied to water level prediction in dams, rivers, and lakes [26]. These methods consistently demonstrate higher accuracy and greater flexibility than traditional statistical models (e.g., autoregressive integrated moving average (ARIMA), regression) and are also superior in quantifying and communicating uncertainties.

However, existing studies still reveal several research gaps. First, there is a lack of comprehensive comparison between classical regression and advanced ML techniques using long-term daily datasets. Second, many previous works have not incorporated bias-oriented or hydrological efficiency measures (e.g., Nash–Sutcliffe Efficiency (NSE), Kling–Gupta Efficiency (KGE)) in evaluating performance. Third, limited research focuses specifically on dams in Türkiye, despite their critical role in regional water management.

This study focuses explicitly on Karaçomak Dam in the Western Black Sea Region of Türkiye, which provides irrigation, drinking water, and flood control for Kastamonu Province. The dam’s importance for regional socio-economic activities and vulnerability to climatic fluctuations justify its selection as the study area. While some regional studies on water resources and hydrological planning—such as the Karaçomak Dam Reservoir Protection Plan [27], artificial neural network–based water quality modeling for Karaçomak as a drinking water source in Kastamonu [28], and the Kastamonu Province Drought Action Plan [29]—have been conducted, comprehensive machine learning–based forecasting applications for reservoir water levels remain scarce in this region.

Data-driven modeling uses ML techniques to create models for a specific system from existing data [1]. In recent years, these models have been integrated as a complement to or, in some cases, have replaced physics-based models (e.g., hydrodynamic models) [30]. ML techniques have demonstrated significant advantages in modeling complex and multivariable systems, yielding more accurate and reliable results [10,31].

Therefore, this study aims to evaluate and compare the predictive performance of linear regression and multiple ML algorithms (decision tree, random forest, and XGBoost) for reservoir water level forecasting at Karaçomak Dam. The study aims to identify models that can achieve high accuracy with a limited but routinely available dataset (precipitation, maximum, minimum, and average temperature, reservoir level, and volume) while also assessing bias and hydrological efficiency metrics (mean bias error (MBE), NSE, KGE). The novelty of this work lies in its integration of cost-effective data-efficient modeling with advanced ML algorithms, providing a practical and scalable decision support tool for sustainable dam management in Türkiye.

2. Materials and Methods

2.1. Study Area

The study was conducted at Karaçomak Dam within the borders of Kastamonu Province in the Western Black Sea Region of Türkiye. Karaçomak Dam is located on Karaçomak Creek in the Kızılırmak Basin, at 41°19′08″ North (41.31889° N) latitude and 33°44′41″ East (33.74472° E) longitude (Figure 1). It was constructed between 1968 and 1973 for irrigation, drinking water, and flood prevention in Kastamonu Province and was put into operation in 1976. The dam, which is an earthen body fill type, has a dam body volume of 1,100,000 m³, a height of 49.00 m above the stream bed, a lake volume of 23.10 hm³ at the normal water level, and a lake area of 1.43 km² at the normal water level. It provides irrigation services to an area of 2596 hectares and 3 hm³ of drinking water per year [32].

2.2. Dataset

In this study, a dataset containing daily meteorological data and dam reservoir level variables for the period between January 2008 and April 2024 was used. This dataset includes the reservoir water volume (the total amount of water stored in the dam reservoir during the day) obtained from the Kastamonu Regional Directorate of State Hydraulic Works, the reservoir water level (daily water height in the dam reservoir), maximum temperature (highest daily temperature in the study area, measured in degrees Celsius, °C), minimum temperature (lowest daily temperature in the study area, °C), average temperature (mean daily temperature in the study area, °C), and total precipitation (total daily precipitation in the study area, measured in millimeters, mm) obtained from the Kastamonu Meteorological Station Directorate.

2.3. Data Analysis

The study conducted a missing value analysis on the dataset, and no missing observations were detected. This outcome reflects the quality-control procedures of the official data providers, namely, the State Hydraulic Works (DSİ) and the General Directorate of Meteorology (MGM). In these institutional processes, minor inconsistencies or gaps may be corrected before the datasets are released, which explains why the dataset used in this study contained no missing values. Subsequently, descriptive statistics were calculated, including mean, minimum, maximum, standard deviation, and percentiles.

This study’s modeled parameter (target variable) is the daily water level of Karaçomak Dam (m). The predictors (independent variables) are precipitation (mm), maximum temperature (°C), minimum temperature (°C), average temperature (°C), and reservoir water volume (hm³). These clarifications have been explicitly included to distinguish the dependent and independent variables in the modeling process.

2.3.1. Regression Model Approach in Reservoir Storage Estimation Models

In this study, linear regression analysis (LRA), a traditional modeling method, was applied to compare with machine learning models, and a regression model was developed. LRA is a commonly preferred approach in many dam-related studies, including estimating reservoir water levels [33,34,35,36,37]. The advantages of regression models—such as requiring less data for parameter coefficient adjustment and being relatively simple to implement [34]—make them widely applicable.

To ensure the robustness of the regression model, cross-correlations between predictors were first examined, and variance inflation factor (VIF) values were calculated. Severe multicollinearity was detected among temperature-related variables, and therefore, an AIC-based variable reduction method was applied to retain only the most informative predictors. Furthermore, regression assumptions such as linearity, normality, homoscedasticity, and autocorrelation were systematically tested using diagnostic tools including Q–Q plots, Shapiro–Wilk tests, Breusch–Pagan tests, and the Durbin–Watson statistic.

LRA is a statistical technique that models the linear relationship between a dependent variable and one or more independent variables. Its primary purpose is to determine the set of coefficients that best predict the value of the dependent variable. The mathematical representation of the linear regression model can be expressed as follows (Formula (1)):

Ŷ = β_{0} + β_{1} X_{1} + β_{2} X_{2} + \dots + β_{n} X_{n} + ε

(1)

Here, β₀ denotes the intercept, β₁ … β_n represent the regression coefficients, X₁ … X_n denote the independent variables, and ε represents the error term [38].

2.3.2. Machine Learning Approach for Reservoir Water Volume Prediction Models

In the machine learning applications, three different methods—decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost)—were employed, and their prediction performances were comparatively analyzed. The analyses were conducted using the Python programming language (version 3.11.11). Specifically, the scikit-learn library was used for preprocessing, regression, and tree-based models; the XGBoost package was employed for gradient boosting; and supporting libraries such as numpy and pandas were used for data handling, while matplotlib was applied for visualization.

Decision Tree: A decision tree is a rule-based method that divides a dataset into multiple nonlinear subregions and generates predictions for these regions. The model is called a regression tree if the predictions consist of continuous values, and a classification tree if the predictions consist of discrete and limited categories [39]. In a decision tree, the root node represents the process’s starting point and indicates the dataset’s initial partition. Decision nodes represent intermediate points where the data are split into branches based on specific characteristics. At the end of the process, the leaf nodes represent the final sections where the decision-making phase is completed, with each leaf corresponding to a specific decision or prediction outcome. The outputs obtained from a leaf node constitute the classification or prediction results [40].

Random Forest: Random forest is an ensemble learning method that combines multiple decision trees and makes predictions based on the majority vote of the trees. Each tree is constructed independently using randomly selected samples and features from the training data. This approach improves prediction accuracy and reduces the risk of overfitting [3,41].

Extreme Gradient Boosting Modeling (XGBoost): XGBoost is a highly scalable ensemble learning method based on gradient boosting and decision trees. The model incrementally optimizes the objective function by minimizing the loss function and employs a specialized regularization term to control tree complexity. This regularization determines tree parsimony and the splitting threshold, specifically the parameter γ, based on the number of leaves and their scores. The loss function formula, including the regularization term used by XGBoost to control tree complexity and prevent overfitting, is presented below (Formulas (2) and (3)).

L_{x g b} = \sum_{i = 1}^{N} L (y_{i}, \hat{F} (x_{i})) + \sum_{m = 1}^{M} Ω (h_{m})

(2)

Ω (h) = γ T + \frac{1}{2} λ ∥ w ∥^{2}

(3)

This formula aims to create more straightforward and efficient trees by considering the model’s prediction error and structural complexity. In the formula, N represents the total number of observations in the training dataset,

y_{i}

denotes the actual target value of the i-th observation, and

\hat{F}

(

x_{i}

) is the predicted value for the i-th observation. The expression L(y_i,

\hat{F}

(

x_{i}

)) refers to the loss function that measures the difference between predicted and actual values. M indicates the number of trees used in the boosting process, while h_m refers to the m-th decision tree. The regularization function, Ω(h_m), controls the model’s complexity by penalizing overly complex trees. In addition, T denotes the number of leaves in a tree, w represents the weight vector of leaf nodes, γ (gamma) specifies the minimum loss reduction required to make a split, and λ (lambda) is the L2 regularization parameter applied to the leaf weights. Finally, ∥w∥² corresponds to the sum of squared leaf weights, ensuring that the model remains accurate and generalizable by preventing overfitting [42].

All machine learning algorithms employed in this study (decision tree, random forest, and XGBoost) are supervised learning methods, as they were trained using input variables (meteorological and hydrological predictors) together with corresponding target values (reservoir storage levels) to predict future outcomes.

2.3.3. Model Optimization

Standardization was applied in the linear regression model to eliminate the effects of scale differences between variables. For tree-based models, no scaling was performed. Standardization rescales the variables in the dataset so that their mean equals zero and their standard deviation equals one. In this process, the mean of the respective variable is subtracted from each observation value, and the resulting difference is divided by the variable’s standard deviation. The following formula expresses this transformation:

z = \frac{x - \bar{x}}{σ}

(4)

In the formula, z represents the standardized value, x represents the original observation,

\bar{x}

represents the mean of the variable, and σ represents the standard deviation [43].

In addition, residual diagnostics were carried out to evaluate the validity of the regression models. These analyses included residuals versus fitted plots, Q–Q plots, and statistical tests for normality and homoscedasticity. The results of these diagnostics were used to identify potential assumption violations in the ordinary least squares (OLS) model and to justify comparisons with alternative machine learning approaches.

The hyperparameters for each model were optimized using GridSearchCV with a fivefold cross-validation method. GridSearchCV systematically evaluates different models and hyperparameter combinations through cross-validation. Each combination is tested independently, its performance is measured, and validation is carried out. This process ensures that all possible options are systematically explored to identify the hyperparameter set that yields the lowest error rate and the highest accuracy [44].

In addition to the general explanation of the optimization procedure, the explored hyperparameter ranges and the optimal values obtained for each algorithm are summarized in Table 1. This tabular representation provides transparency to the model selection process and ensures the reproducibility of the study.

2.3.4. Model Evaluation Methods

The training and test datasets were split into 80% and 20% subsets to evaluate model performance. Mean squared error (MSE), coefficient of determination (R²), root-mean-squared error (RMSE), and mean absolute error (MAE) were employed as evaluation metrics. In addition, bias and hydrological efficiency indices, namely, mean bias error (MBE), Nash–Sutcliffe Efficiency (NSE), and Kling–Gupta Efficiency (KGE), were also calculated to provide a more comprehensive assessment of model performance [45,46,47,48]. These metrics capture the model’s predictive accuracy and error magnitude from different perspectives, providing a comprehensive performance assessment.

Mean squared error (MSE) represents the average of the squared differences between the model’s predictions and the actual observed values (Formula (5)). A lower MSE value indicates a better model fit to the data [49].

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(5)

In the formula,

{\hat{y}}_{i}

represents the actual observed value, n represents the value predicted by the model, and n represents the total data.

The coefficient of determination (R²) measures the proportion of variance in the dependent variable that can be explained by the independent variables (Formula (6)). As the R² value approaches 1, the explanatory power of the model increases, indicating a better fit to the data.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(6)

In the formula,

{\hat{y}}_{i}

, represents the value predicted by the model,

\bar{y}

represents the average of the observed values [50].

The root-mean-square error is obtained by taking the square root of the average of the squares of the prediction errors (Formula (7)).

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(7)

Mean absolute error (MAE) represents the average of the absolute values of the prediction errors (Formula (8)).

M A E = \frac{1}{n} \sum_{i = 1}^{n} ⎸ y_{i} - {\hat{y}}_{i} ⎸

(8)

RMSE is equivalent to the L2 norm, and MAE is equivalent to the L1 norm [51].

The mean bias error (MBE) indicates the average tendency of a model to overestimate (positive values) or underestimate (negative values) the observations (Formula (9)) [47].

M B E = (\frac{1}{n}) \sum_{i = 1}^{n} (P i - O i)

(9)

The Nash–Sutcliffe Efficiency (NSE) measures how well the predicted time series matches the observed data, with values closer to 1 indicating higher predictive skill (Formula (10)) [45].

NSE = 1 - (\sum_{i = 1}^{n} (O i - P i)^{2}) (\sum_{i = 1}^{n} (O i - \bar{O})^{2})

(10)

The Kling–Gupta Efficiency (KGE) combines correlation, variability, and bias into a single diagnostic measure, with values closer to 1 representing more reliable model predictions (Formula (11)) [46].

KGE = 1 - \sqrt{(r - 1)^{2} + (α - 1)^{2} + (β - 1)^{2}}

(11)

Here, r denotes the linear correlation between observed and simulated values, α represents the variability ratio, and β represents the bias ratio [46].

In addition to the aforementioned metrics, symmetric mean absolute percentage error (SMAPE), explained variance score (EVS), and median absolute error (MedAE) were also calculated to provide further robustness in model evaluation. SMAPE measures the relative prediction error as a percentage, ensuring symmetry between overestimation and underestimation. EVS quantifies the proportion of variance in the observed data explained by the model compared to a baseline prediction. MedAE captures the median of absolute errors, offering greater robustness against the influence of outliers. Incorporating these additional criteria together with the classical performance indicators ensures a more robust and transparent model selection process.

Based on all statistical criteria, the models with the highest coefficient of determination (R²) and hydrological efficiency indices (NSE, KGE), together with the lowest error and bias values, were determined. The relative ranking method suggested by Poudel and Cao [52] was used to select the most suitable model.

2.3.5. Assessment Methods for Model Forecasting Ability

In addition to the numerical results of performance metrics (MAE, MSE, RMSE, and R²), error bubble plots were created to graphically illustrate the models’ predictive success against the observed values. These plots evaluated model performance through numerical metrics and examined the temporal distribution of errors. In the performance evaluation analysis, forecasts were generated for the 1st, 7th, 15th, and 21st of each month, starting from 1 May 2024 to 21 July 2025, and these forecasts were compared with actual values. A paired-samples t-test was also performed to determine whether the differences between observed and predicted values were statistically significant.

3. Results

Within the study’s scope, the dataset’s descriptive statistical parameters were initially analyzed. The number of observations, mean, minimum and maximum values, 25%, 50% (median), and 75% percentiles, and standard deviation were calculated for each variable. The results are summarized in Table 2.

When Table 2 is analyzed, the elevation variable ranges from 804.02 m to 889.99 m, with an average of 883.42 m. The volume variable ranges from 3.096 hm³ to 25.143 hm³, with a mean of 15.42 hm³. Regarding temperature variables, the maximum temperature varies between −9.2 °C and 41.6 °C, while the minimum temperature ranges from −20.2 °C to 22.9 °C. The average temperature ranges from −13.9 °C to 29.8 °C, with a mean of 10.43 °C. The total precipitation variable ranges from 0 to 82.6 mm.

Prior to model training, we conducted multicollinearity and assumption diagnostics. The correlation heatmap (Figure 2) shows strong pairwise correlations among temperature-related predictors (|r| ≳ 0.85–0.95). Consistently, VIF analysis (Table 3) indicates severe multicollinearity for daily_avg_temp (VIF ≈ 95.27), daily_max_temp (≈31.79), and daily_min_temp (≈27.82), while season (≈1.57), daily_precipitation (≈1.12), and elevation (≈1.12) remain acceptable. Guided by these findings, an AIC-based reduction retained only the most informative predictors for the OLS models.

In addition, residual diagnostics (Figure 3) revealed deviations from normality (Shapiro–Wilk p < 0.001), heteroskedasticity (Breusch–Pagan p < 0.001), and potential autocorrelation (Durbin–Watson = 1.47), which together motivated the comparative use of tree-based ensemble methods.

3.1. Model Implementation and Optimal Model Selection

Models designed to estimate dam reservoir storage levels were tested using 35,784 observations from six variables collected over 17 years (5.964 days). Table 4 presents the MAE, MSE, RMSE, and R² values obtained for four models before hyperparameter tuning and cross-validation applications.

As seen in Table 4, the R² values of all tree-based models (decision tree, random forest, and XGBoost) in the training dataset are 1.000, and the error values are close to zero, indicating that these models perfectly fit the training data. In the test dataset, the R² values remain in the range of 0.982–0.983, demonstrating that the models preserve their high predictive accuracy. However, the decision tree model exhibits relatively higher test errors.

Although the linear regression model achieved an R² of 0.983 on the training dataset, this value dropped to 0.574 on the test dataset. Consequently, the test RMSE value (2.898) was significantly higher than the other models. This reduction in generalization performance is consistent with the OLS assumption violations identified in Figure 3 and the multicollinearity evidenced by Figure 2 and Table 3.

When comparing tree-based models, random forest and XGBoost yielded the lowest error values on the test data (MAE ≈ 0.046–0.076, RMSE ≈ 0.584–0.585), and their generalization performances were quite similar. While the test MAE (0.046) of the decision tree model was comparable, its RMSE (0.590) and MSE (0.348) were slightly higher than those of the other tree-based models.

Hyperparameter optimization and cross-validation were applied to improve model performance. The models were then retrained using the optimal parameters obtained, and their performance was re-evaluated. Table 5 presents the results after hyperparameter optimization and cross-validation.

An examination of Table 5 demonstrates that the tree-based models (decision tree, random forest, XGBoost) achieved R² values ranging from 0.999 to 1.000 on the training data, accompanied by minimal error levels, thereby confirming their strong fitting capacity. On the test data, these models maintained an R² value of 0.983, indicating high predictive accuracy and enhanced robustness and generalization ability as a result of the applied hyperparameter optimization. These gains are consistent with the expectation that tree-based ensembles are less sensitive to multicollinearity and non-Gaussian, heteroskedastic residual structures than OLS.

To strengthen model validation, we additionally computed symmetric mean absolute percentage error (SMAPE), explained variance score (EVS), and median absolute error (MedAE) (Table 6). The extended evaluation corroborated the ensemble models’ superiority: random forest achieved the lowest test SMAPE (0.371%) and MedAE (0.003) while maintaining a high EVS (0.983), closely followed by decision tree (test SMAPE ≈ 0.409%; MedAE ≈ 0.008; EVS ≈ 0.983) and XGBoost (test SMAPE ≈ 0.601%; MedAE ≈ 0.031; EVS ≈ 0.983). In contrast, linear regression yielded substantially higher errors (Test SMAPE ≈ 4.40%; MedAE ≈ 0.525) and a markedly lower explained variance, confirming its inferior generalization.

In the decision tree model, although the test MAE value changed only marginally (from 0.049 to 0.046) after optimization, the RMSE (0.587) and MSE (0.345) values remained stable. The random forest model consistently performed, maintaining the lowest MSE (0.342) and RMSE (0.585) values on the test set. The XGBoost model, in contrast, achieved one of the most favorable outcomes regarding error metrics, further reducing the test RMSE to 0.580.

However, the linear regression model’s performance remained unchanged after optimization. The test R² value stayed at 0.574, while the RMSE (2.898) continued to be significantly higher than that of the tree-based models, confirming the inadequacy of the linear approach in capturing the complex data relationships. Together with the diagnostics in Figure 3, this result indicates that OLS predictions are adversely affected by assumption violations in this dataset.

The optimal parameter combinations obtained for each model following hyperparameter optimization are presented in Table 7.

Examining Table 7, the decision tree model achieved the best performance with a maximum depth of 20, a minimum of one sample per leaf, and a minimum of 10 samples required for splitting. The optimal parameters for the random forest model were a maximum depth of 10, a minimum of one sample per leaf, a minimum of two samples for splitting, and 100 trees. In the XGBoost model, the optimal configuration included a column subsampling ratio of 1.0, a learning rate of 0.1, a maximum depth of 3, 100 trees, and a subsampling ratio of 0.8. For the linear regression model, only the setting of fit_intercept = True was the most suitable.

The relative ranking method proposed by Poudel and Cao [52] was then applied to identify the best-performing model based on test results from the evaluation metrics. The outcomes of this ranking are presented in Table 8.

Based on this evaluation, the random forest model achieved the highest relative score, indicating its superior predictive performance compared to the other models. Consistent with these additional criteria (SMAPE, EVS, MedAE), random forest was identified as the optimal model and is, therefore, used as the reference model in subsequent analyses and interpretation.

In addition to conventional error metrics, bias and hydrological efficiency measures were also evaluated [45,46,47,48,53]. As presented in Table 9, linear regression exhibited relatively poor performance, while the decision tree achieved high accuracy. Nevertheless, ensemble models (random forest and XGBoost) demonstrated the most robust and reliable performance, with very low MBE values indicating the absence of systematic bias and NSE/KGE values close to 1 confirming their accuracy and generalization ability.

3.2. Evaluating the Models’ Prediction Capabilities

In the graphs, the black line represents the actual values, while the bubbles denote the model predictions. The size of each bubble indicates the magnitude of the prediction error, and the color scale reflects the error level (blue: low error, red: high error). Figure 4, Figure 5, Figure 6 and Figure 7 present the prediction performances of the linear regression, decision tree, random forest, and XGBoost models, allowing readers to compare the models’ differences visually. To complement these visualizations, Table 10 provides the exact numerical values of the measurements taken on the 1st, 7th, 15th, and 21st of each month from 1 May 2024 to 21 July 2025, alongside the predicted values of the four models and their corresponding errors. This combined presentation ensures that visual interpretation and detailed quantitative verification are possible. The results of the paired-samples t-test conducted between the predicted and actual values are provided in Table 11.

The t-test results indicate that there is no statistically significant difference (p > 0.05) between the predicted values of all models and the actual values (Table 11). This finding suggests that the model predictions do not significantly deviate from the actual values.

According to the comparison results obtained from the graphs and Table 10, the difference values in the Linear Regression model are generally negative and have high absolute values (e.g., around −0.9). This pattern indicates a systematic tendency toward overestimation. Furthermore, the long-term error is considerably higher, demonstrating poorer performance than the other models. These outcomes are consistent with the weak performance of the model’s statistical performance metrics. They are also coherent with the OLS residual diagnostics, which showed non-normal and heteroskedastic residuals.

The differences in the decision tree model results are generally within ±0.02–0.03, indicating a low error rate. Only on a few dates do the differences reach 0.5 or greater, which is rare. Overall, the predictions are very close to the observed values, although this reflects the model’s tendency to overfit the training data.

In the random forest model results, the differences are similar to those of the decision tree model, but generally smaller. Although larger deviations, such as −0.19, occasionally occur, the overall error rate remains low and consistent. This finding supports the model’s low error rates and high R² values observed in the test dataset.

The differences in the XGBoost model results are predominantly within the range of 0.00 to ±0.05. Although slightly larger deviations, such as 0.09 or −0.14, appear on specific dates, these cases are infrequent. The deviations are balanced in direction, and the mean difference remains minimal. These results demonstrate that the XGBoost model predictions are highly consistent with the actual values. This outcome aligns with the model’s low error rates and high R² scores on the test data, confirming its strong generalization capacity and high predictive accuracy.

4. Discussion

The study results show that tree-based ensemble models, particularly the XGBoost and random forest algorithms, provide the most accurate and balanced results in predicting reservoir water levels in Türkiye. However, the linear regression model could not adequately capture the complex nonlinear relationships between hydrological and meteorological variables. R² values above 98% and low error metrics (MAE ≈ 0.046–0.071, RMSE ≈ 0.58) indicate that these models demonstrate strong robustness in dealing with multidimensional and nonlinear datasets. The results obtained from this study are consistent with previous studies, confirming that machine learning-based ensemble methods outperform traditional regression and time series approaches [54,55,56]. This superiority is consistent with our diagnostic findings: residual analyses indicated deviations from normality and heteroskedasticity, and the Durbin–Watson statistic suggested autocorrelation, while strong collinearity among temperature predictors was confirmed by the heatmap and VIF results, together undermining OLS generalization performance.

Like the present study, Khai et al. [55] demonstrated that random forest and gradient boosting methods provide significantly higher accuracy than multiple linear regression in dam water level prediction. Asare et al. [57] highlighted the limitations of statistical methods such as principal component regression (PCR) and ARIMA in capturing sudden fluctuations. In contrast, machine learning methods can model nonlinear relationships more successfully. Comparative studies on hydrology in the literature [58,59,60] also indicate that ensemble methods excel in both accuracy and generalization capacity in long-term and multivariate datasets. For example, Özdoğan et al. [61] used hybrid models, including random forest and ridge regression, on the Loskop Dam in South Africa to accurately estimate dam volume (R² ≈ 0.99; RMSE = 4.88 MCM). Similarly, the authors of [1,3] obtained successful results in outlet discharge and water level predictions for dams in Spain using advanced time series models, such as artificial neural networks (ANN) and NARX. The authors of [6] evaluated monthly flow forecasts for the Hirakud Dam in India by comparing them with machine learning (relevance vector machine) and statistical downscaling. Similarly, Ref. [62] used artificial neural network (ANN) models to predict water level on the Yalova Gökçe Dam, which performed better than traditional regression models. In another study on Keban Dam, water level changes were successfully modeled using ANFIS and support vector machines (SVM) [37]. In our analyses, we also applied an AIC-based variable reduction to remove redundant temperature predictors; nevertheless, key OLS assumptions remained violated, further justifying the use of ensemble models for operational forecasting.

The overfitting tendency in the decision tree model observed in our study represents a well-documented limitation of single-tree-based algorithms. While these models successfully represent complex structures, they often suffer from limited generalization ability on unseen data [41]. In contrast, random forest reduces variance by combining multiple trees, and XGBoost achieves a more effective balance between bias and variance through boosting and regularization [63]. The slightly better performance of XGBoost in our study aligns with the widely reported effectiveness of gradient boosting methods in water resource management [1,64]. Notably, both random forest and XGBoost preserved high accuracy on the test set despite multicollinearity and non-Gaussian, heteroskedastic residuals, highlighting their robustness to the violations that degraded OLS.

A significant contribution of this research is the application of advanced machine learning models using a long-term (17 years, 5964 observations) daily dataset on dam reservoirs in Türkiye. Recent reviews conducted by Azad et al. [26] have revealed that previous regional studies have primarily relied on regression or time-series approaches, which are insufficient to capture nonlinear dependencies between climatic variables and reservoir storage dynamics. Previous studies in the regional literature have generally relied on regression-based or time-series methods such as ARIMA [65] and have provided limited insights into the nonlinear dependencies between hydro-meteorological variables. This study addresses a methodological gap by systematically comparing four different models and demonstrates that modern ensemble approaches deserve prioritization in operational water management in Türkiye. This is among the first regional applications to report a full suite of regression diagnostics (Q–Q, Shapiro–Wilk, Breusch–Pagan, and Durbin–Watson tests) alongside ML benchmarking for daily reservoir levels in Türkiye.

This study also revealed a systematic overestimation tendency in the linear regression model, with errors surpassing 0.9 hm³ in some periods. This finding demonstrates that models relying solely on linear assumptions prove inadequate in representing complex hydrological processes, such as precipitation, evaporation, and temperature changes. Previous studies [36,56] have highlighted that linear models have been remarkably ineffective in capturing extreme events and seasonal variability, while ensemble learning methods demonstrate consistent performance across different conditions. This systematic bias is visible in the error-bubble visuals and is consistent with the positive MBE observed for linear regression.

Although random forest was determined to be the best model based on metrics, XGBoost showed the highest consistency between predictions and actual values. These results demonstrate that in such studies, relying solely on conventional evaluation metrics is insufficient and that rigorous validation of results is essential. Accordingly, we complemented conventional errors with hydrological skill (NSE, KGE), bias (MBE), and statistical significance tests to ensure that the conclusions are not an artifact of a single metric.

In particular, this study provided an additional validation layer, including bias and hydrological efficiency measures (MBE, NSE, and KGE). Very low MBE values confirmed the absence of systematic bias, while NSE values close to 1 [45] indicated strong predictive skill. Similarly, KGE values near unity [46,53] demonstrated high agreement between observed and simulated values, reflecting accuracy and reliability in representing hydrological dynamics. The combined use of these indices, which has been widely recommended in hydrological model evaluation [47,48], reinforces the superiority of ensemble methods in terms of conventional error metrics and hydrological performance indicators. These outcomes, the multicollinearity evidence, and the residual diagnostics provide a coherent rationale for preferring ensemble models over OLS in this application.

Beyond methodological contributions, the findings of this study are directly linked to the United Nations Sustainable Development Goals (SDGs). Accurate reservoir water level forecasting supports SDG 6 (Clean Water and Sanitation), particularly Indicator 6.4.2 (Level of water stress) and Indicator 6.5.1 (Integrated water resource management implementation) by enhancing efficient and sustainable allocation of freshwater resources. Moreover, by strengthening the capacity to anticipate hydrological variability, the study contributes to SDG 13 (Climate Action), especially Target 13.1 (Resilience and adaptive capacity to climate-related hazards). Finally, the potential application of ML-based forecasts to optimize irrigation and reduce ecological pressures aligns with SDG 15 (Life on Land), underscoring the ecological co-benefits of sustainable reservoir operation.

Recent studies suggest that SDG indicators can be operationalized through quantitative metrics. For example, Marinelli et al. [66] assessed SDG 6.4.2 by calculating the ratio of freshwater withdrawals to renewable resources, reporting that this value varied across regions from below 0.1 (very low stress) to above 1 (extreme stress) and that over 90% of non-renewable groundwater withdrawals were concentrated in just seven countries. Similarly, Tinoco et al. [67] assessed SDG 6.5.1 using the UNEP-DHI IWRM Data Portal. They reported implementation scores on a scale of 0–100, with Mexico scoring 49 (medium-low), Brazil 51 (medium-high), and Chile 22.6 (low). This study is not only conceptually aligned with SDGs 6, 13, and 15, but it also contributes to their advancement. Future research could follow these approaches to calculate dam-specific SDG 6.4.2 water stress rates and incorporate SDG 6.5.1 implementation indices, thus providing stronger comparability of reservoir management in Turkey with international standards.

Although water level measurements are relatively simple and cost-effective, mechanistic models that rely solely on reservoir morphometry (e.g., elevation–area–volume relationships) have inherent limitations. High-resolution bathymetric data are not always available, and such models cannot adequately represent nonlinear dependencies between hydro-meteorological variables such as precipitation, evaporation, and temperature. Machine learning approaches, on the other hand, can capture these complex relationships by learning directly from long-term observational datasets, thereby providing more accurate and reliable forecasts. For this reason, the present study adopts machine learning as a complementary alternative to traditional mechanistic modeling.

From a practical perspective, the high accuracy provided by XGBoost and random forest highlights the potential for integrating these models into dam reservoir management decision support systems in Türkiye. Reliable short- and medium-term water level forecasts can enable more effective water allocation among agricultural, urban, and ecological demands, and support adaptation to climate-related uncertainties. Future research should address residual autocorrelation by incorporating lagged predictors and/or hybridizing with GLS or ARIMAX-type models, consider penalized linear methods (ridge, lasso, elastic net) to retain interpretability under multicollinearity, and enrich features with seasonality and hydrological memory (e.g., rolling precipitation/level aggregates) and, where available, operational variables such as inflows/outflows and release schedules. Future studies could contribute to improving both resilience and interpretability through the integration of satellite-based precipitation and evaporation data or by hybridizing physical hydrological models with machine learning approaches [68].

This study used specific hyperparameter ranges (e.g., learning rate = 0.01–0.1, max_depth = 3–10) for model optimization. Experimenting with wider ranges and different parameter combinations could provide a more comprehensive evaluation of the model’s generalizability and performance. However, this is beyond the scope of the current study and is recommended as an important area for future research to further validate and explore.

In conclusion, machine learning-enabled models successfully address the limitations of classical hydrological modeling and offer practical tools for operational support to decision-makers. This study confirms that ensemble machine learning methods constitute a substantial methodological advancement in hydrological prediction compared to classical regression and single-model approaches. In Türkiye’s water resource management context, these results provide theoretical validation and a practical roadmap for sustainable dam reservoir management under climate uncertainty. Overall, the convergence of assumption diagnostics, multicollinearity assessments, and superior ensemble performance forms a consistent evidence base supporting the adoption of random forest/XGBoost for daily reservoir level forecasting at Karaçomak Dam.

5. Conclusions

This study comprehensively evaluated the performance of different machine learning algorithms in estimating daily water levels in a dam reservoir located in the Black Sea Region of Türkiye.

Among the four selected models (linear regression, decision tree, random forest, and XGBoost), random forest and XGBoost algorithms stood out with their high accuracy (R² ≈ 0.98) and low error values (MAE ≈ 0.046–0.071; RMSE ≈ 0.58).
The linear regression model failed to capture nonlinear relationships between hydro-meteorological variables and exhibited a systematic tendency to overestimate.
The findings demonstrate that ensemble learning methods provide more reliable and generalizable reservoir volume estimation results than classical regression and single-tree-based approaches.

These results are consistent with similar national and international studies in the literature and offer a methodological contribution to applications using long-term daily datasets in Türkiye. The study also provides important practical implications. Reliable short- and medium-term forecasts enable more balanced water management among agricultural irrigation, drinking water supply, and ecosystem services. Integrating robust models such as XGBoost and random forest into decision support systems, particularly in the face of uncertainties caused by climate change, can potentially enhance the efficiency of dam reservoir management and water resources planning in Türkiye. Future research can contribute to more comprehensive results by integrating satellite-based hydro-meteorological data (e.g., precipitation, evaporation) with physically based hydrological models and testing hybrid artificial intelligence approaches.

In conclusion, this study demonstrates that machine learning-supported ensemble algorithms offer a powerful alternative to classical hydrological modeling methods and are applicable, reliable, and scalable tools for sustainable dam reservoir management.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The author would like to thank the General Directorate of State Hydraulic Works (DSİ) and the Turkish State Meteorological Service (MGM) for their support in providing the data used in this study.

Conflicts of Interest

The author declares no conflicts of interest.

References

García-Feal, O.; González-Cao, J.; Fernández-Nóvoa, D.; Astray Dopazo, G.; Gómez-Gesteira, M. Comparison of machine learning techniques for reservoir outflow forecasting. Nat. Hazards Earth Syst. Sci. 2022, 22, 3859–3874. [Google Scholar] [CrossRef]
Marques, É.T.; Gunkel, G.; Sobral, M.C. Management of tropical river basins and reservoirs under water stress: Experiences from Northeast Brazil. Environments 2019, 6, 62. [Google Scholar] [CrossRef]
Soria-López, A.; Sobrido-Pouso, C.; Mejuto, J.C.; Astray, G. Assessment of different machine learning methods for reservoir outflow forecasting. Water 2023, 15, 3380. [Google Scholar] [CrossRef]
Ouma, Y.O.; Moalafhi, D.B.; Anderson, G.; Nkwae, B.; Odirile, P.; Parida, B.P.; Qi, J. Dam water level prediction using vector autoregression, random forest regression and MLP-ANN models based on land-use and climate factors. Sustainability 2022, 14, 14934. [Google Scholar] [CrossRef]
Quandt, A.; O’Shea, B.; Oke, S.; Ololade, O.O. Policy interventions to address water security impacted by climate change: Adaptation strategies of three case studies across different geographic regions. Front. Water 2022, 4, 935422. [Google Scholar] [CrossRef]
Yang, T.; Asanjan, A.A.; Welles, E.; Gao, X.; Sorooshian, S.; Liu, X. Developing reservoir monthly inflow forecasts using artificial intelligence and climate phenomenon information. Water Resour. Res. 2017, 53, 2786–2812. [Google Scholar] [CrossRef]
Munir, B.A.; Ahmad, S.R.; Rehan, R. Torrential flood water management: Rainwater harvesting through relation-based dam suitability analysis and quantification of erosion potential. ISPRS Int. J. Geo-Inf. 2021, 10, 27. [Google Scholar] [CrossRef]
Zhou, F.; Li, L. Forecasting reservoir inflow via recurrent neural ODEs. Proc. AAAI Conf. Artif. Intell. 2021, 35, 15025–15032. [Google Scholar] [CrossRef]
Twisa, S.; Mwabumba, M.; Kurian, M.; Buchroithner, M.F. Impact of land-use/land-cover change on drinking water ecosystem services in Wami River Basin, Tanzania. Resources 2020, 9, 37. [Google Scholar] [CrossRef]
Almubaidin, M.A.A.; Ahmed, A.N.; Sidek, L.B.M.; Elshafie, A. Using metaheuristics algorithms (MHAs) to optimize water supply operation in reservoirs: A review. Arch. Comput. Methods Eng. 2022, 29, 3677–3711. [Google Scholar] [CrossRef]
Canedo, C.; Pillco Zolá, R.; Berndtsson, R. Role of hydrological studies for the development of the TDPS system. Water 2016, 8, 144. [Google Scholar] [CrossRef]
Sharma, K.V.; Kumar, V.; Singh, K.; Mehta, D.J. LANDSAT 8 LST pan sharpening using novel principal component-based downscaling model. Remote Sens. Appl. Soc. Environ. 2023, 30, 100963. [Google Scholar] [CrossRef]
Utete, B.; Nhiwatiwa, T.; Kavhu, B.; Kusangaya, S.; Viriri, N.; Mbauya, A.W.; Tsamba, J. Assessment of water levels and the effects of climatic factors and catchment dynamics in a shallow subtropical reservoir, Manjirenji Dam, Zimbabwe. J. Water Clim. Chang. 2019, 10, 580–590. [Google Scholar] [CrossRef]
Laassilia, O.; Saghiry, S.; Ouazar, D.; Bouziane, A.; Hasnaoui, M.D. Flood forecasting with a dam watershed event-based hydrological model in a semi-arid context: Case study in Morocco. Water Pract. Technol. 2022, 17, 817–834. [Google Scholar] [CrossRef]
Mohammed-Ali, W.S.; Khairallah, R.S. Flood risk analysis: The case of Tigris River (Tikrit/Iraq). Tikrit J. Eng. Sci. 2023, 30, 112–118. [Google Scholar] [CrossRef]
Fischer, S.; Schumann, A.; Bühler, P. Timescale-based flood typing to estimate temporal changes in flood frequencies. Hydrol. Sci. J. 2019, 64, 1867–1892. [Google Scholar] [CrossRef]
Persiano, S.; Ferri, E.; Antolini, G.; Domeneghetti, A.; Pavan, V.; Castellarin, A. Changes in seasonality and magnitude of sub-daily rainfall extremes in Emilia-Romagna (Italy) and potential influence on regional rainfall frequency estimation. J. Hydrol. Reg. Stud. 2020, 32, 100751. [Google Scholar] [CrossRef]
Hirabayashi, Y.; Mahendran, R.; Koirala, S.; Konoshima, L.; Yamazaki, D.; Watanabe, S.; Kim, H.; Kanae, S. Global flood risk under climate change. Nat. Clim. Chang. 2013, 3, 816–821. [Google Scholar] [CrossRef]
Zhao, Y.; Weng, Z.; Chen, H.; Yang, J. Analysis of the evolution of drought, flood, and drought-flood abrupt alternation events under climate change using the daily SWAP index. Water 2020, 12, 1969. [Google Scholar] [CrossRef]
Liu, C.; Guo, L.; Ye, L.; Zhang, S.; Zhao, Y.; Song, T. A review of advances in China’s flash flood early-warning system. Nat. Hazards 2018, 92, 619–634. [Google Scholar] [CrossRef]
Wasko, C.; Nathan, R.; Stein, L.; O’Shea, D. Evidence of shorter, more extreme rainfalls and increased flood variability under climate change. J. Hydrol. 2021, 603, 126994. [Google Scholar] [CrossRef]
Wasko, C. Floods differ in a warmer future. Nat. Clim. Change 2022, 12, 1090–1091. [Google Scholar] [CrossRef]
Wu, L.; Jiang, J.; Li, G.; Ma, X. Characteristics of pulsed runoff-erosion events under typical rainstorms in a small watershed on the Loess Plateau of China. Sci. Rep. 2018, 8, 3672. [Google Scholar] [CrossRef] [PubMed]
Yi, S.; Yi, J. Reservoir-based flood forecasting and warning: Deep learning versus machine learning. Appl. Water Sci. 2024, 14, 237. [Google Scholar] [CrossRef]
Rehamnia, I.; Mahdavi-Meymand, A. Advancing reservoir water level predictions: Evaluating conventional, ensemble and integrated swarm machine learning approaches. Water Resour. Manag. 2025, 39, 779–794. [Google Scholar] [CrossRef]
Azad, A.S.; Islam, N.; Nabi, M.N.; Khurshid, H.; Siddique, M.A. Developments and trends in water level forecasting using machine learning models—A review. IEEE Access 2025, 13, 63048–63065. [Google Scholar] [CrossRef]
General Directorate of Water Management (SYGM). Karaçomak Dam Reservoir Protection Plan (Draft Special Provisions); Ministry of Agriculture and Forestry: Ankara, Türkiye, 2022. [Google Scholar]
Imneisi, I.B.; Aydın, M.; Güneş Şen, S. Status of Drinking Water in Kastamonu City: A Comparative Study Using Water Quality Index(WQI), Ammonia, Iron, Phosphate and Manganese Concentrations between (2011–2015). In Proceedings of the International Congress on Engineering and Life Science, Kastamonu, Türkiye, 26–29 April 2018. [Google Scholar]
General Directorate of Agricultural Reform (TRGM). Kastamonu Province Drought Action Plan; Ministry of Agriculture and Forestry: Ankara, Türkiye, 2017. Available online: https://kastamonu.tarimorman.gov.tr/Belgeler/Kastamonu_Kuraklik_Eylem_Plani.pdf (accessed on 15 July 2025).
Solomatine, D.P.; Ostfeld, A. Data-driven modelling: Some past experiences and new approaches. J. Hydroinform. 2008, 10, 3–22. [Google Scholar] [CrossRef]
Demirkıran Ada, E. Artificial intelligence applications in taxation processes: The digital transformation of the revenue administration. Int. J. Adv. Nat. Sci. Eng. Res. 2025, 9, 212–217. Available online: https://as-proceeding.com/index.php/ijanser/article/view/2766 (accessed on 15 July 2025).
Devlet Su İşleri Genel Müdürlüğü (DSİ). Kastamonu 23. Bölge Müdürlüğü. 2025. Available online: https://bolge23.dsi.gov.tr/Sayfa/Detay/1235 (accessed on 15 July 2025).
Hu, D.X.; Zhou, Z.Q.; Li, Y.; Wu, X.L. Dam safety analysis based on stepwise regression model. Adv. Mater. Res. 2011, 204–210, 2158–2161. [Google Scholar] [CrossRef]
Egawa, T.; Suzuki, K.; Ichikawa, Y.; Iizaka, T.; Matsui, T.; Shikagawa, Y. A water flow forecasting for dam using neural networks and regression models. In Proceedings of the 2011 IEEE Power and Energy Society General Meeting, Detroit, MI, USA, 24–28 July 2011; pp. 1–6. [Google Scholar] [CrossRef]
Stojanovic, B.; Milivojevic, M.; Ivanovic, M.; Milivojevic, N.; Divac, D. Adaptive system for dam behavior modeling based on linear regression and genetic algorithms. Adv. Eng. Softw. 2013, 65, 182–190. [Google Scholar] [CrossRef]
Busker, T.; de Roo, A.; Gelati, E.; Schwatke, C.; Adamovic, M.; Bisselink, B.; Pekel, J.-F.; Cottam, A. A global lake and reservoir volume analysis using a surface water dataset and satellite altimetry. Hydrol. Earth Syst. Sci. 2019, 23, 669–690. [Google Scholar] [CrossRef]
Arslan, H.; Üneş, F.; Demirci, M.; Taşar, B.; Yılmaz, A. Keban baraj gölü seviye değişiminin ANFIS ve destek vektör makineleri ile tahmini. Osman. Korkut Ata Üniversitesi Fen Bilim. Enstitüsü Derg. 2020, 3, 71–77. [Google Scholar] [CrossRef]
Qu, K. Research on linear regression algorithm. MATEC Web Conf. 2024, 395, 01046. [Google Scholar] [CrossRef]
Suthaharan, S. Decision Tree Learning. In Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning; Springer: Boston, MA, USA, 2016; pp. 237–269. [Google Scholar] [CrossRef]
Gültepe, Y. Makine öğrenmesi algoritmaları ile hava kirliliği tahmini üzerine karşılaştırmalı bir değerlendirme. Avrupa Bilim Ve Teknol. Derg. 2019, 16, 8–15. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
Thara, D.K.; PremaSudha, B.G. Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques. Pattern Recognit. Lett. 2019, 128, 544–550. [Google Scholar] [CrossRef]
Hidayat, T.; Manongga, D.; Nataliani, Y.; Wijono, S.; Prasetyo, S.Y.; Maria, E.; Sembiring, I. Performance prediction using cross validation (GridSearchCV) for stunting prevalence. In Proceedings of the 2024 IEEE International Conference on Artificial Intelligence and Mechatronics Systems (AIMS), Bandung, Indonesia, 21–23 February 2024; pp. 1–6. [Google Scholar] [CrossRef]
Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
Gupta, H.V.; Kling, H.; Yilmaz, K.K.; Martinez, G.F. Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J. Hydrol. 2009, 377, 80–91. [Google Scholar] [CrossRef]
Willmott, C.J. On the validation of models. Phys. Geogr. 1981, 2, 184–194. [Google Scholar] [CrossRef]
Moriasi, D.N.; Arnold, J.G.; Van Liew, M.W.; Bingner, R.L.; Harmel, R.D.; Veith, T.L. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE 2007, 50, 885–900. [Google Scholar] [CrossRef]
Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
Gahlot, S.; Mangal, R.; Arya, A.; Sethy, B.P.; Sethi, K.C. Prediction of swelling pressure of expansive soil using machine learning methods. Asian J. Civ. Eng. 2025, 26, 549–564. [Google Scholar] [CrossRef]
Hodson, T.O. Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. 2022, 15, 5481–5487. [Google Scholar] [CrossRef]
Poudel, K.P.; Cao, Q.V. Evaluation of methods to predict Weibull parameters for characterizing diameter distributions. For. Sci. 2013, 59, 243–252. [Google Scholar] [CrossRef]
Kling, H.; Fuchs, M.; Paulin, M. Runoff conditions in the upper Danube basin under an ensemble of climate change scenarios. J. Hydrol. 2012, 424–425, 264–277. [Google Scholar] [CrossRef]
Ahmed, A.N.; Othman, F.B.; Afan, H.A.; Ibrahim, R.K.; Fai, C.M.; Hossain, M.S.; El-Shafie, A. Machine learning methods for better water quality prediction. J. Hydrol. 2019, 578, 124084. [Google Scholar] [CrossRef]
Khai, W.J.; Alraih, M.; Ahmed, A.N.; Fai, C.M.; El-Shafie, A. Daily forecasting of dam water levels using machine learning. Int. J. Civ. Eng. Technol. 2019, 10, 314–323. [Google Scholar]
Shortridge, J.E.; Guikema, S.D.; Zaitchik, B.F. Machine learning methods for empirical streamflow simulation: A comparison of model accuracy, interpretability, and uncertainty in seasonal watersheds. Hydrol. Earth Syst. Sci. 2016, 20, 2611–2628. [Google Scholar] [CrossRef]
Asare, I.O.; Frempong, D.A.; Larbi, P. Use of principal components regression and time-series analysis to predict the water level of the Akosombo dam level. Int. J. Stat. Appl. 2018, 8, 332–340. [Google Scholar] [CrossRef]
Chau, K.W. A review on integration of artificial intelligence into water quality modelling. Mar. Pollut. Bull. 2006, 52, 726–733. [Google Scholar] [CrossRef]
Ruppert, D. The elements of statistical learning: Data mining, inference, and prediction. J. Am. Stat. Assoc. 2004, 99, 567. [Google Scholar] [CrossRef]
Zarafshan, P.; Etezadi, H.; Javadi, S.; Roozbahani, A.; Hashemy, S.M.; Zarafshan, P. Comparison of machine learning models for predicting groundwater level, case study: Najafabad region. Acta Geophys. 2023, 71, 1817–1830. [Google Scholar] [CrossRef]
Özdoğan-Sarıkoç, G.; Sarıkoç, M.; Çelik, M.; Dadaşer-Çelik, F. Reservoir volume forecasting using artificial intelligence-based models: Artificial Neural Networks, Support Vector Regression, and Long Short-Term Memory. J. Hydrol. 2023, 616, 128766. [Google Scholar] [CrossRef]
Damla, Y.; Temiz, T.; Keskin, E. Yapay sinir ağı kullanılarak su seviyesinin tahmin edilmesi: Yalova Gökçe Barajı örneği. Kırklareli Üniversitesi Mühendislik Ve Fen Bilim. Derg. 2020, 6, 32–49. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Hao, Z.; Singh, V.P.; Hao, F. Compound extremes in hydroclimatology: A review. Water 2018, 10, 718. [Google Scholar] [CrossRef]
Jouma, N.; Dadaşer-Çelik, F. Simulation of Irrigation and Reservoir Storage in the Develi Basin (Türkiye) Using Soil and Water Assessment Tool (SWAT). Süleyman Demirel Üniversitesi Ziraat Fakültesi Derg. 2018, 468–476. [Google Scholar]
Marinelli, M.; Biancalani, R.; Joyce, B.; Djihouessi, M.B. A new methodology to estimate the level of water stress (SDG 6.4.2) by season and by sub-basin avoiding the double counting of water resources. Water 2025, 17, 1543. [Google Scholar] [CrossRef]
Tinoco, C.; Julio, N.; Meirelles, B.; Pineda, R.; Figueroa, R.; Urrutia, R.; Parra, Ó. Water resources management in Mexico, Chile and Brazil: Comparative analysis of their progress on SDG 6.5.1 and the role of governance. Sustainability 2022, 14, 5814. [Google Scholar] [CrossRef]
Huang-Lachmann, J.T.; Lovett, J.C. How cities prepare for climate change: Comparing Hamburg and Rotterdam. Cities 2016, 54, 36–44. [Google Scholar] [CrossRef]

Figure 1. The geographic location of Karaçomak Dam.

Figure 2. The correlation heatmap of predictors.

Figure 3. Residual diagnostics for the OLS model.

Figure 4. Comparison of predicted and actual values for the linear regression model.

Figure 5. Comparison of predicted and actual values for the decision tree model.

Figure 6. Comparison of predicted and actual values for the random forest model.

Figure 7. Comparison of predicted and actual values for the XGBoost model (hm³).

Table 1. Tested hyperparameter ranges and optimal values for each algorithm.

Model	Hyperparameter	Values Tested (Grid)	Optimal Value
Linear Regression	fit_intercept	{True, False}	True
Decision Tree	max_depth	{3, 5, 10, 20, None}	20
	min_samples_split	{2, 5, 10}	10
	min_samples_leaf	{1, 2, 4}	1
Random Forest	n_estimators	{100, 200}	100
	max_depth	{5, 10, None}	10
	min_samples_split	{2, 5}	2
	min_samples_leaf	{1, 2}	1
XGBoost	n_estimators	{100}	100
	max_depth	{3, 5, 10}	3
	learning_rate	{0.01, 0.1}	0.1
	subsample	{0.8, 1.0}	0.8
	colsample_bytree	{0.8, 1.0}	1.0

Table 2. Descriptive statistics of the study variables.

Variable	Count	Mean	Min.	25%	50%	75%	Max.	Std.
Volume (hm³)	5940	15.41938	3.096	10.90675	15.6025	20.065	25.143	4.946872
Elevation (m)	5940	883.415636	804.02	879.5175	884.23	887.77	889.99	4.914119
Max. Temperature (°C)	5940	17.668872	−9.2	9.7	18.2	25.9	41.6	9.750686
Min. Temperature (°C)	5940	4.810825	−20.2	−0.4	4.8	10.5	22.9	6.765597
Avg. Temperature (°C)	5940	10.430859	−13.9	3.8	10.4	17.5	29.8	7.944562
Total Precipitation (mm)	5940	1.545135	0.0	0.0	0.0	0.6	82.6	4.61001

Table 3. Variance inflation factor (VIF) for candidate predictors.

Variable	VIF
daily_avg_temp	95.269
daily_max_temp	31.788
daily_min_temp	27.821
season	1.565
daily_precipitation	1.118
elevation	1.117

Table 4. Model performance results prior to hyperparameter tuning and cross-validation.

Model Name	MAE		MSE		RMSE		R²
Model Name	Train	Test	Train	Test	Train	Test	Train	Test
Linear Regression	0.526	0.649	0.418	8.396	0.647	2.898	0.983	0.574
Decision Tree	0.000	0.046	0.000	0.348	0.000	0.590	1.000	0.982
Random Forest	0.007	0.046	0.006	0.342	0.075	0.585	1.000	0.983
XGBoost	0.017	0.076	0.001	0.342	0.028	0.584	1.000	0.983

Table 5. Model performance results after hyperparameter optimization and cross-validation.

	MAE		MSE		RMSE		R²
Model Name	Train	Test	Train	Test	Train	Test	Train	Test
Linear Regression	0.526	0.649	0.418	8.396	0.647	2.898	0.983	0.574
Decision Tree	0.011	0.049	0.007	0.345	0.082	0.587	1.000	0.983
Random Forest	0.008	0.046	0.006	0.342	0.075	0.585	1.000	0.983
XGBoost	0.039	0.071	0.012	0.337	0.111	0.580	0.999	0.983

Table 6. Additional validation metrics (SMAPE, EVS, MedAE) for model performance evaluation.

Model Name	SMAPE		EVS		MedAE
Model Name	Train	Test	Train	Test	Train	Test
Linear Regression	4.373	4.395	0.983	0.577	0.479	0.525
Decision Tree	0.078	0.409	1.000	0.983	0.001	0.008
Random Forest	0.051	0.371	1.000	0.983	0.001	0.003
XGBoost	0.274	0.601	0.999	0.983	0.022	0.031

Table 7. Optimal hyperparameter combinations obtained through hyperparameter optimization.

Model Name	Best Parameteres
Linear Regression	{‘fit_intercept’: True}
Decision Tree	{‘max_depth’: 20, ‘min_samples_leaf’: 1, ‘min_samples_split’: 10}
Random Forest	{‘max_depth’: 10, ‘min_samples_leaf’: 1, ‘min_samples_split’: 2, ‘n_estimators’: 100}
XGBoost	{‘colsample_bytree’: 1.0, ‘learning_rate’: 0.1, ‘max_depth’: 3, ‘n_estimators’: 100, ‘subsample’: 0.8}

Table 8. Results of the relative ranking method.

	MAE		MSE		RMSE		R²		Weighted Total Score	Performance Ranking
	Test	Norm.	Test	Norm.	Test	Norm.	Test	Norm.	Weighted Total Score	Performance Ranking
Linear Regression	0.649	0.0709	8.396	0.0401	2.898	0.2001	0.574	0.5838	0.2237	4
Decision Tree	0.049	0.9388	0.345	0.9768	0.587	0.9871	0.983	1.0000	0.9757	2
Random Forest	0.046	1.0000	0.342	0.9854	0.585	0.9915	0.983	1.0000	0.9942	1
XGBoost	0.071	0.6479	0.337	1.0000	0.580	1.0000	0.983	1.0000	0.9119	3

Table 9. Performance results of the models, including bias (MBE) and hydrological efficiency metrics (NSE, KGE).

Model	MBE (Train)	MBE (Test)	NSE (Train)	NSE (Test)	KGE 2009 [46] (Test)	KGE 2012 [53]
Model	MBE (Train)	MBE (Test)	NSE (Train)	NSE (Test)	KGE 2009 [46] (Test)	(Test)	r	γ (Variability)	β (Bias)
Linear Regression	−0.00	0.200	0.983	0.574	0.779	0.788	0.819	1.109	1.015
Decision Tree	−0.00	−0.008	1.000	0.983	0.991	0.991	0.991	0.998	0.999
Random Forest	0.00	−0.010	1.000	0.983	0.991	0.991	0.991	0.998	0.999
XGBoost	−0.00	−0.004	0.999	0.983	0.991	0.991	0.991	0.997	1.000

Table 10. Date-wise prediction performance and error values of different models.

Date	Actual	Linear Regression	Decision Tree	Random Forest	XGBoost	Diff-LR	Diff-DT	Diff-RF	Diff-XGB
1 May 2024	14.91	15.86	14.91	14.91	14.88	−0.95	0.00	0.00	0.03
7 May 2024	15.29	16.15	15.29	15.28	15.27	−0.86	0.00	0.01	0.02
15 May 2024	15.02	15.81	15.03	15.00	14.98	−0.79	−0.01	0.02	0.04
21 May 2024	15.03	16.02	15.03	15.03	14.97	−0.99	0.00	0.00	0.06
1 June 2024	15.94	16.72	15.92	16.13	16.03	−0.78	0.02	−0.19	−0.09
7 June 2024	16.23	16.98	16.25	16.24	16.25	−0.75	−0.02	−0.01	−0.02
15 June 2024	15.90	16.15	14.94	15.16	15.21	−0.25	0.96	0.74	0.69
21 June 2024	15.21	16.16	14.94	15.16	15.21	−0.95	0.27	0.05	0.00
1 July 2024	13.79	14.76	13.79	13.80	13.93	−0.97	0.00	−0.01	−0.14
7 July 2024	12.53	13.54	12.03	12.16	12.48	−1.01	0.50	0.37	0.05
15 July 2024	12.05	13.11	12.04	12.05	12.07	−1.06	0.01	0.00	−0.02
21 July 2024	11.23	12.21	11.23	11.23	11.28	−0.98	0.00	0.00	−0.05
1 August 2024	10.69	11.48	10.68	10.70	10.70	−0.79	0.01	−0.01	−0.01
7 August 2024	9.98	10.65	9.98	9.98	9.95	−0.67	0.00	0.00	0.03
15 August 2024	8.91	9.26	8.89	8.91	8.89	−0.35	0.02	0.00	0.02
21 August 2024	8.74	9.08	8.75	8.74	8.70	−0.34	−0.01	0.00	0.04
1 September 2024	8.07	7.77	8.08	8.09	8.17	0.30	−0.01	−0.02	−0.10
7 September 2024	8.64	8.57	8.60	8.64	8.71	0.07	0.04	0.00	−0.07
15 September 2024	8.74	8.77	8.75	8.74	8.71	−0.03	−0.01	0.00	0.03
21 September 2024	8.54	8.30	8.52	8.54	8.46	0.24	0.02	0.00	0.08
1 October 2024	8.25	7.93	8.18	8.24	8.17	0.32	0.07	0.01	0.08
7 October 2024	8.06	7.65	8.08	8.07	8.18	0.41	−0.02	−0.01	−0.12
15 October 2024	7.84	7.20	7.82	7.83	7.79	0.64	0.02	0.01	0.05
21 October 2024	7.78	6.95	7.78	7.78	7.79	0.83	0.00	0.00	−0.01
1 November 2024	7.50	6.56	7.53	7.50	7.61	0.94	−0.03	0.00	−0.11
7 November 2024	7.36	6.21	7.37	7.37	7.31	1.15	−0.01	−0.01	0.05
15 November 2024	7.18	6.21	7.37	7.37	7.31	0.97	−0.19	−0.19	−0.13
21 November 2024	7.15	5.96	7.14	7.15	7.18	1.19	0.01	0.00	−0.03
1 December 2024	7.18	6.35	7.17	7.19	7.17	0.83	0.01	−0.01	0.01
7 December 2024	7.04	6.15	7.05	7.04	7.04	0.89	−0.01	0.00	0.00
15 December 2024	6.90	5.81	6.91	6.90	6.91	1.09	−0.01	0.00	−0.01
21 December 2024	7.06	6.16	7.07	7.06	7.04	0.90	−0.01	0.00	0.02
1 January 2025	7.26	6.53	7.26	7.26	7.25	0.73	0.00	0.00	0.01
7 January 2025	7.17	6.40	7.17	7.17	7.17	0.77	0.00	0.00	0.00
15 January 2025	7.21	6.54	7.21	7.20	7.17	0.67	0.00	0.01	0.04
21 January 2025	7.48	6.90	7.47	7.48	7.44	0.58	0.01	0.00	0.04
1 February 2025	7.60	7.23	7.60	7.61	7.60	0.37	0.00	−0.01	0.00
7 February 2025	7.60	7.03	7.60	7.60	7.60	0.57	0.00	0.00	0.00
15 February 2025	7.48	6.99	7.47	7.48	7.44	0.49	0.01	0.00	0.04
21 February 2025	7.54	6.81	7.53	7.54	7.61	0.73	0.01	0.00	−0.07
1 March 2025	7.75	7.24	7.74	7.74	7.70	0.51	0.01	0.01	0.05
7 March 2025	7.93	7.59	7.90	7.92	7.94	0.34	0.03	0.01	−0.01
15 March 2025	8.39	8.52	8.32	8.38	8.45	−0.13	0.07	0.01	−0.06
21 March 2025	8.72	8.54	8.69	8.72	8.72	0.18	0.03	0.00	0.00
1 April 2025	8.90	9.14	8.89	8.90	8.90	−0.24	0.01	0.00	0.00
7 April 2025	9.17	9.45	9.18	9.17	9.18	−0.28	−0.01	0.00	−0.01
15 April 2025	9.33	9.57	9.33	9.33	9.33	−0.24	0.00	0.00	0.00
21 April 2025	9.37	9.82	9.37	9.37	9.41	−0.45	0.00	0.00	−0.04
1 May 2025	10.58	11.26	10.57	10.58	10.54	−0.68	0.01	0.00	0.04
7 May 2025	10.94	11.87	10.95	10.94	10.92	−0.93	−0.01	0.00	0.02
15 May 2025	11.15	11.92	11.13	11.14	11.10	−0.77	0.02	0.01	0.05
21 May 2025	11.02	11.85	11.01	11.02	10.93	−0.83	0.01	0.00	0.09
1 June 2025	11.14	11.82	11.13	11.14	11.11	−0.68	0.01	0.00	0.03
7 June 2025	11.19	12.03	11.17	11.17	11.28	−0.84	0.02	0.02	−0.09
15 June 2025	11.07	11.74	11.07	11.08	11.11	−0.67	0.00	−0.01	−0.04
21 June 2025	10.86	11.48	10.87	10.86	10.93	−0.62	−0.01	0.00	−0.07

Table 11. Paired-samples t-test results between model predictions and actual values.

Model	t Statistic	p Value
Linear Regression	0.586	0.560
Decision Tree	−1.638	0.107
Random Forest	−0.915	0.364
XGBoost	−0.508	0.614

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Güneş Şen, S. Machine Learning-Based Water Level Forecast in a Dam Reservoir: A Case Study of Karaçomak Dam in the Kızılırmak Basin, Türkiye. Sustainability 2025, 17, 8378. https://doi.org/10.3390/su17188378

AMA Style

Güneş Şen S. Machine Learning-Based Water Level Forecast in a Dam Reservoir: A Case Study of Karaçomak Dam in the Kızılırmak Basin, Türkiye. Sustainability. 2025; 17(18):8378. https://doi.org/10.3390/su17188378

Chicago/Turabian Style

Güneş Şen, Senem. 2025. "Machine Learning-Based Water Level Forecast in a Dam Reservoir: A Case Study of Karaçomak Dam in the Kızılırmak Basin, Türkiye" Sustainability 17, no. 18: 8378. https://doi.org/10.3390/su17188378

APA Style

Güneş Şen, S. (2025). Machine Learning-Based Water Level Forecast in a Dam Reservoir: A Case Study of Karaçomak Dam in the Kızılırmak Basin, Türkiye. Sustainability, 17(18), 8378. https://doi.org/10.3390/su17188378

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning-Based Water Level Forecast in a Dam Reservoir: A Case Study of Karaçomak Dam in the Kızılırmak Basin, Türkiye

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Dataset

2.3. Data Analysis

2.3.1. Regression Model Approach in Reservoir Storage Estimation Models

2.3.2. Machine Learning Approach for Reservoir Water Volume Prediction Models

2.3.3. Model Optimization

2.3.4. Model Evaluation Methods

2.3.5. Assessment Methods for Model Forecasting Ability

3. Results

3.1. Model Implementation and Optimal Model Selection

3.2. Evaluating the Models’ Prediction Capabilities

4. Discussion

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI