Next Article in Journal
Recent Advances on the VAN Method
Previous Article in Journal
Anti-Sliding Trenches to Enhance Slope Stability of Internal Dumps on Inclined Foundations in Open-Pit Coal Mines
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Innovative Approach for Forecasting Hydroelectricity Generation by Benchmarking Tree-Based Machine Learning Models

by
Bektaş Aykut Atalay
and
Kasım Zor
*
Department of Electrical and Electronic Engineering, Graduate School, Adana Alparslan Türkeş Science and Technology University, 01250 Adana, Türkiye
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(19), 10514; https://doi.org/10.3390/app151910514
Submission received: 28 August 2025 / Revised: 15 September 2025 / Accepted: 24 September 2025 / Published: 28 September 2025

Abstract

Hydroelectricity, one of the oldest and most potent forms of renewable energy, not only provides low-cost electricity for the grid but also preserves nature through flood control and irrigation support. Forecasting hydroelectricity generation is vital for utilizing alleviating resources effectively, optimizing energy production, and ensuring sustainability. This paper provides an innovative approach to hydroelectricity generation forecasting (HGF) of a 138 MW hydroelectric power plant (HPP) in the Eastern Mediterranean by taking electricity productions from the remaining upstream HPPs on the Ceyhan River within the same basin into account, unlike prior research focusing on individual HPPs. In light of tuning hyperparameters such as number of trees and learning rates, this paper presents a thorough benchmark of the state-of-the-art tree-based machine learning models, namely categorical boosting (CatBoost), extreme gradient boosting (XGBoost), and light gradient boosting machines (LightGBM). The comprehensive data set includes historical hydroelectricity generation, meteorological conditions, market pricing, and calendar variables acquired from the transparency platform of the Energy Exchange Istanbul (EXIST) and MERRA-2 reanalysis of the NASA with hourly resolution. Although all three models demonstrated successful performances, LightGBM emerged as the most accurate and efficient model by outperforming the others with the highest coefficient of determination (R2) (97.07%), the lowest root mean squared scaled error (RMSSE) (0.1217), and the shortest computational time (1.24 s). Consequently, it is considered that the proposed methodology demonstrates significant potential for advancing the HGF and will contribute to the operation of existing HPPs and the improvement of power dispatch planning.

1. Introduction

Hydroelectric power remains a key element of renewable energy with its high efficiency and low costs [1]. In 2023, hydroelectricity’s contribution of 14.17% to total electricity generation and 47.17% to renewable electricity production underscored its significant role in the global energy mix and its dominant position within the renewable sector [2]. Negating its role on supply and demand sets hydroelectrics apart from other renewable technologies. This makes it a critical stabilizer in increasingly decentralized and uncontrollable electric grids [3]. Additionally, hydroelectricity enhances grid stability, offers long-duration energy storage, and plays a crucial role in integrating intermittent renewable sources like wind and solar into energy systems aiming for net-zero emissions [4,5]. Moreover, it is considered that the environmental benefits of hydroelectricity are well-documented in the literature. Studies have shown that hydroelectricity substantially lessens carbon emissions compared to fossil fuels [6,7]. As the electricity demand grows and the integration of variable renewable sources accelerates, the need for more accurate hydroelectricity forecasts becomes crucial [8]. Forecasting is vital for effective power dispatch planning, sustainable water resource management, and grid reliability [9,10,11].
Non-linear dynamics including hydrological and meteorological variations and operational aspects impact water availability and hydroelectricity generation. This variability poses noteworthy challenges for classic forecasting approaches relying on linear statistical models. Moreover, the ecological impacts of HPPs on river ecosystems highlight the need for equilibrium between energy production and ecosystem preservation [12].
Precise and trustworthy hydroelectricity forecasting is important for many reasons, such as enabling efficient energy management, reservoir operations, and grid integration of this erratic renewable resource. Meticulous forecasts support effective resource allocation such as water, the minimization of energy shortages, and improvements to the overall stability of the grid. Inaccurate forecasts may result in grid inequalities, economic losses, and low operational efficiency. ML-based algorithms can retain complex patterns and relationships from large data sets, making them well-suited for dealing with the challenges encountered in HGF. Tree-based ML models such as random forest (RF), gradient boosted decision trees (GBDT), CatBoost, LightGBM, and XGBoost have shown particular promise in handling data with non-linear characteristics and have demonstrated elevated performance in various forecasting tasks. Despite their potential, the application of tree-based methods in hydroelectricity forecasting remains relatively unexplored, and a rigorous comparative analysis is lacking [13].
The Ceyhan River is one of the essential waterways in the Eastern Mediterranean as shown in Figure 1. Aslantaş HPP and other HPPs in the same basin harness the energy of the Ceyhan River to produce electricity. With an installed capacity of 138 MW, Aslantaş HPP aims to generate 569 GWh of electricity annually. Behind the HPP, the vast reservoir, with 1150 million cubic meters of water on average, plays a paramount role in irrigating a substantial land area of about 149,849 hectares. Aslantaş HPP’s crucial role in managing floods along the Ceyhan River is beyond generating electricity [14]. Consequently, the management of hydroelectricity generation needs accurate forecasting to utilize water resources and grid stability, empowering caretakers responsible for electricity generation and water management to make informed decisions for a more sustainable future. Table 1 presents the HPPs installed on the Ceyhan River in detail.
This study proposes a novel method that benchmarks tree-based ML techniques by using basin hydroelectricity generation data at the Aslantaş HPP to bridge a critical deficiency in HGF. By utilizing historical power generation data of all HPPs on the Ceyhan River in the same basin, meteorological measures, market prices, and categorized date–time records as input parameters, this study aims to develop robust and efficient forecasting models. Therefore, this study addresses an obvious research gap by proposing an innovative approach to forecast hydroelectricity generation. Specifically, this paper employs CatBoost, LightGBM, and XGBoost in forecasting hourly hydroelectricity generation at the Aslantaş HPP in the Eastern Mediterranean.
Furthermore, this study investigates the comparative accuracy and reliability of these models using a variety of input parameters, including historical electricity generation, temperature, humidity, wind speed, shortwave flux, and calendar data.
The performance of the models is rigorously evaluated by calculating the R2 and RMSSE metrics.
Below are the original contributions of this research:
  • First and foremost, Python, which is an open-source programming language, is used in this paper on a publicly available data set to present reproducible work for other researchers studying the same field and to bring the term of reproducibility to the fore in scientific writing.
  • One of the main contributions of this study is to propose an innovative approach for forecasting hydroelectricity generation of an HPP by paying attention to the electricity productions of the other upstream HPPs on the same river (or within the same basin) alongside a variety of explanatory features containing meteorological, market, calendar, and historical hydroelectricity generation. The proposed methodology uniquely differs this paper from other studies in the literature that focus on a single HPP and offers a more comprehensive perspective on basin-wide hydrological and operational dynamics for future studies. Furthermore, the HGF literature is considered immature in terms of covering studies with real-time data in the short-term horizon, and it is thought that this paper will bridge the highlighted gap and reinforce the current literature.
  • For the first time in the literature, this paper fulfills a thorough benchmark of state-of-the-art tree-based machine learning models, namely XGBoost, LightGBM, and CatBoost, by taking the tuning of the hyperparameters such as the number of trees and learning rates into consideration. To the best of one’s knowledge, no previous research has conducted a direct head-to-head comparison of these algorithms in forecasting hydroelectricity generation under identical constraints with the same performance and error metrics.
The rest of this study is organized as follows: Section 2 reviews the relevant literature on HGF, with a specific focus on machine learning applications. Section 3 details the data and methodological approaches used in this study. Section 4 thoroughly presents the results and analyses obtained and discussed. Finally, Section 5 summarizes the study’s findings and outlines future research directions.

2. Related Work

This section reviews the related works on HGF. While various methods and approaches have been explored in this domain, a focused and systematic review targeting the HGF for various plant types and operating conditions is currently lacking. The existing body of work is chronologically presented such that it offers valuable insights and identifies the crucial gaps that motivate the actual study.
Ref. [13] explicitly identified this research gap, emphasizing the limited application of HGF compared to other renewable energy sources. This gap underscores the need for targeted reviews focusing on HGF to bridge this divide and advance the state of the field.
Ref. [16] systematically reviewed ML models in energy systems but mentioned HGF only briefly concerning renewable energy systems. Ref. [17] examined ANN applications in energy and reliability prediction across solar, wind, and hydraulic energy sources but provided limited coverage of hydroelectricity. The literature contains comprehensive reviews highlighting advancements in renewable energy forecasting, particularly leveraging ML and DL techniques. These reviews have extensively explored forecasting models for wind, solar, and other renewable energy sources, focusing on integrating data-driven methodologies, hybrid models, and optimization algorithms to enhance forecasting accuracy and reliability [16,17,18,19].
These studies collectively emphasize the role of ML techniques in advancing renewable energy forecasting systems and optimizing energy grid operations. However, many studies have centered on wind and solar energy forecasting, leaving hydroelectric comparatively under-explored. HGF remains a relatively underrepresented area in the literature.
To provide a coherent overview of the field, the reviewed HGF studies are grouped according to their methodological foundations.

2.1. Statistical Models

Regression-based approaches have been widely applied for HGF, ranging from multiple linear correlation with stepwise selection to climate-informed regression using large-scale predictors, while more recent studies benchmarked Gaussian processes and support vector regression against traditional formulations, with kernel-based methods showing improved accuracy [20,21,22].
ARIMA and its extensions (ARIMAX and SARIMA) have been widely applied in HGF, linking generation with precipitation, capturing seasonal fluctuations, and supporting medium-term planning across diverse regions. Comparative analyses highlighted Holt–Winters as effective for seasonal variability, while evaluations in Brazil showed that even simple seasonal naïve baselines can provide competitive references. Applications in Vietnam, Malaysia, Ghana, and Rwanda further underscored the suitability of ARIMA/SARIMA approaches in data-limited settings [23,24,25,26,27,28,29].
Grey models, seasonal and data-grouping extensions, and more recent fractional-order formulations with buffer operators and metaheuristic optimization have been applied to monthly and quarterly hydroelectricity generation. These approaches are particularly useful when historical data are limited, offering robust alternatives to conventional statistical methods [30,31,32]. Short-term generation has been modeled with precipitation, demand, and past production using statistical bias correction. Flow–duration curves and reference flows have also supported feasibility assessments of small hydroelectric power plants [33].

2.2. Neural Networks-Based Models

Neural networks have become central to HGF by providing non-linear mapping capabilities [34,35,36,37], and applications in Türkiye showed their value for estimating generation potential in irrigation dams [38].
Applications include ABC algorithm for Türkiye’s national generation [39], particle swarm optimization in BP-ANNs for small HPPs [40], Bat Algorithm for Malaysian reservoirs [41], and firefly optimization for small HPP forecasting [42,43]. Brazilian case studies demonstrated the potential of deep neural networks [44], while LSTM and ELM improved temporal modeling of small hydroelectricity generation [45,46]. Additional architectures such as GMDH were tested against MLPs in the Amazon basin [47], and ANN models were adapted to water–energy interactions in Malaysia [48].
Recent studies introduced specialized neural architectures, including Transformer-enhanced LSTM, temporal convolutional hybrids, and extensions to interconnected systems as HPPs integrated with WDS with lowered forecasting errors and improved robustness in large-scale applications [49,50,51,52].

2.3. Tree-Based Models

Tree-based ensemble methods have recently gained traction for HGF. RF has been shown to provide stable baselines for reservoir generation prediction [53], while GBDT variants achieved superior accuracy in Turkish case studies [54]. Optimized GBDT and CatBoost models further improved predictive efficiency under diverse inflow conditions [55].
CatBoost excelled in handling categorical inputs [56] compared to XGBoost and LightGBM, while broader benchmarks confirmed its generalization performance [57]. XGBoost was also adapted with meteorological features to improve forecasts [58].
GBDT outperformed ANN variants for SHP forecasting in Poland [59]. XGBoost was validated for hydroelectricity generation [60]. XGBoost and CatBoost coupled with metaheuristics (SMA, AO, and GWO) pushed accuracy further [61,62].

2.4. Hybrid and Other Models

Hybrid and alternative models have been widely developed by combining statistical, neural, and heuristic techniques for HGF. Early studies explored fuzzy systems and evolutionary algorithms, showing that evolving fuzzy inference models could achieve daily inflow predictions comparable to hydrological baselines [63,64]. Genetic algorithms were also applied to reservoir operation planning and long-term inflow modeling, demonstrating their value in optimization-driven forecasting [65,66].
Neuro-fuzzy approaches extended this line of research. ANFIS models optimized with grey wolf algorithms or cascaded structures consistently outperformed classical machine learning alternatives [67,68]. Beyond this, hybrid architectures have emerged that integrate deep learning. Zhou et al. proposed the DeepHydro framework [69], a latent recurrent neural network model, while other studies benchmarked LSTM against XGBoost and SVR [70] and developed ensembles combining LSTM with Conv1D for Cameroon’s Songloulou HPP [71]. LSTM–SVR hybrids were also evaluated in Türkiye [72].
Aksoy [73] evaluated multiple machine learning techniques including kNN, SVR, RF, GA, DNN, RNN, and autoencoders for hourly forecasting.
EEMD–GRU and wavelet–LSTM–RF formulations improved the treatment of nonstationary inputs [74,75], and the HYPE–ANN framework yielded robust forecasts for run-of-river schemes [76].
A Developed Crow Search Algorithm was incorporated into ANN training under climate change conditions in China [77], and ABC was combined with ELM for Turkish SHPs [78]. Hybrid formulations also gained traction, with ANN–GA and PSO–ANN variants improving forecast accuracy in Laos [79,80]. LSTM was applied to Malawian plants [81], LSTM was benchmarked against ANFIS for Turkish run-of-river schemes [82], and LWNRBF networks were adapted for next-day capacity prediction [83].
After reviewing all references related to HGF, it is evident that a significant number of studies have employed tree-based methods for HGF. These studies, as summarized in Table 2, showcase the diversity of approaches and evaluation metrics used in the field. The methods include widely used algorithms such as RF, GBDT, and XGBoost among others.

3. Material and Methods

3.1. Material

The material of this study is a data set that includes a variety of categories classified for energy, weather, market, and calendar variables. The data set covers the period from 1 July 2020 to 31 October 2024. The modeling framework in this study is based on the input variables summarized in Table 3, which include historical hydroelectricity generation, reservoir levels, inflow data, precipitation, temperature, and other relevant meteorological and hydrological indicators. These variables were consistently used as predictors across all modeling experiments and evaluation stages, ensuring a standardized input configuration for all algorithms.
Energy variables including hydroelectricity generation of Aslantaş HPP (indicated in Figure 2) and other HPPs on the Ceyhan River are illustrated in Figure 3. The energy variables also contain the lagging values of Aslantaş HPP’s 1-hour, 1-day, and 1-week lags as well.
Calendar data were extracted from date and time data. Calendar variables consist of the year, month of year, week of year, day of month, hour of day, day of week, and type of day (0 for weekdays and 1 for weekends).
The meteorological data contain air temperature, humidity, wind speed, and shortwave flux. The data were obtained from NASA’s MERRA-2 reanalysis. One is likely associated with atmospheric sounding measurements at various altitudes; another pertains to flux measurements such as energy or heat flux; and the third focuses on radiation data, including radiative flux or cosmic radiation [85,86,87]. The choice of specific meteorological data, such as temperature, humidity, wind speed, and shortwave flux, was driven by their direct impact on water resources and hydrological processes. Temperature influences evaporation rates, which in turn affect water availability. Higher temperatures can increase evaporation, reduce reservoir water levels, and impact the water supply. Humidity is under the influence of the rate of evaporation and transpiration from water bodies and vegetation. Higher humidity can reduce evaporation rates and affect water conservation. Wind speed can influence the evaporation process. Higher wind speeds can increase the evaporation rate and lead to a decrease in water levels. Shortwave flux is related to solar radiation and has impacts on temperature and evaporation rates. Increased solar radiation can cause higher temperatures and evaporation rates.
Market data regarding Turkish electricity spot markets were derived from day-ahead, intraday, and balancing power markets’ prices, namely market clearing price (MCP), weighted average price (WAP), and system marginal price (SMP), respectively. Both energy and market data were acquired from the EXIST transparency platform [88].
The raw data set, obtained from publicly accessible sources, exhibited a limited proportion of missing entries and noisy measurements. To ensure data quality, a systematic preprocessing pipeline was implemented: (i) detection and removal of outliers based on statistical thresholds, (ii) correction or exclusion of erroneous records, and (iii) imputation of missing values using linear interpolation. These measures were adopted to mitigate the potential adverse effects of data quality issues on model training and forecasting performance.
In addition to those, Figure 4 shows the Winsorized Pearson’s correlation map of the exogenous variables of the data set. A Winsorized form of Pearson’s correlation is a robust measure of correlation to evaluate the linear relationship between two independent variables while reducing the impact of outliers at the same time [89]. Within this context, the Winsorized Pearson’s correlation analysis was employed to systematically assess and illustrate the linear associations among variables under the reduced influence of extreme values, thereby offering readers a more reliable understanding of the data set’s structure and the potential relevance of predictors.

3.2. Methods

This study employed several gradient boosting algorithms entitled XGBoost, LightGBM, and CatBoost to model the relationships within the data set, as shown in Figure 5. Gradient boosting is an ensemble learning technique that iteratively builds a strong model from a combination of weaker learners, typically decision trees. These algorithms were selected owing to offering superior predictive performance, efficient computation, and robust hyperparameter optimization capabilities, which make them ideal for benchmarking in HGF tasks in the recent literature. This section details the core mechanisms of each algorithm.

3.2.1. XGBoost

XGBoost is a gradient boosting algorithm proposed by Ref. [91]. Each tree aims to correct the errors or residuals left by its predecessors. By combining the outputs of all these trees, the final prediction is obtained [92]. One of the strengths of XGBoost is its ability to effectively process tabular data and its transparency in model interpretation. XGBoost mechanism is illustrated in Figure 6.
Recognized for its effectiveness across various predictive modeling tasks, XGBoost is a highly scalable machine learning system for tree boosting. It is a prominent implementation of gradient boosting machines (GBM), known for its superior performance in supervised learning tasks. It is suitable for both regression and classification problems [91].
XGBoost offers an open-source implementation of gradient boosting, optimized for high performance, flexibility, and portability. This library implements machine learning algorithms within the Gradient Boosting framework. Leveraging parallel tree boosting (commonly referred to as GBDT or GBM), XGBoost can effectively and rapidly address numerous data science challenges [94].
XGBoost constructs an additive expansion of the objective function by minimizing a loss function. Since decision trees are the sole base classifiers in XGBoost, a modified loss function is employed to regulate tree complexity.
The predicted value y ^ i is the cumulative sum of each decision tree output as expressed in Equation (1).
y ^ i = ϕ ( x i ) = k = 1 K f k x i , f k F
where F denotes the collection of decision trees, f k ( x i ) represents the output generated by the k-th tree for the instance x i , and y ^ i is the predicted value for the i-th instance x i .
The algorithm progressively minimizes the objective function presented in Equation (2).
L ϕ = i = 1 K l ( y ^ i , y i ) + i = 1 K Ω ( f k )
where l denotes a differentiable convex loss function measuring difference between the predicted value y ^ i and the actual target y i . The term Ω in the second part imposes a penalty on the model’s complexity [95].

3.2.2. LightGBM

LightGBM, introduced by Ref. [96], was designed to address the challenges of reduced accuracy and efficiency in GBDT when handling large-scale data sets [97]. This approach integrates Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) techniques into the GBDT framework. The approach retains samples with large gradient values while randomly selecting samples with small gradient values and assigning them constant weights. By doing so, GOSS prioritizes undertrained samples while preserving the original data distribution as illustrated in Figure 7 [98,99].
The Gradient-based One-Side Sampling (GOSS) proposed by Ref. [96] can be summarized as follows:
1.
Rank all training instances by the absolute values of their gradients in descending order.
2.
Retain the top a × 100 % of instances with the largest gradients to form subset A.
3.
From the remaining ( 1 a ) × 100 % instances with smaller gradients, randomly sample b   ×   | A c | instances to create subset B, where A c is the complement of A.
4.
Determine the optimal split by evaluating the variance gain V ˜ j ( d ) over the combined set A B .
The variance gain V ˜ j ( d ) is defined as follows:
V ˜ j ( d ) = 1 n x i A l g i + 1 a b x i B l g i 2 n l j ( d ) + x i A r g i + 1 a b x i B r g i 2 n r j ( d )
where
  • A l = { x i A : x i j d } and A r = { x i A : x i j > d } are the subsets of A split by threshold d.
  • B l = { x i B : x i j d } and B r = { x i B : x i j > d } are the subsets of B split similarly.
The coefficient 1 a b is introduced to normalize the sum of gradients over B back to the original size of A c , ensuring that the smaller gradients in B are properly scaled when calculating V ˜ j ( d ) .
LightGBM is widely applied to energy demand forecasting, solar energy forecasting, the optimization of energy distribution, and the development of effective planning strategies [100,101,102,103].

3.2.3. CatBoost

CatBoost is a gradient boosting algorithm developed by Ref. [104]. It is specifically designed to handle categorical data effectively while delivering high predictive performance. Each tree in CatBoost is trained to reduce residual errors left by its predecessors and combine their outputs to generate the final prediction. A unique strength of CatBoost lies in its ability to natively process categorical features without requiring preprocessing such as one-hot encoding.
CatBoost has been broadly applied to various energy prediction tasks including estimating building energy consumption, forecasting solar electricity generation, and predicting wind power output through hybrid models [105,106].

3.2.4. Model Implementation

All selected ML models are implemented with Python 3.12.0. Python packages XGBoost 4.1.0, CatBoost 1.2.7, and LightGBM 2.1.3 were used. Models were coded with the following regressors: XGBRegressor [94], LGBMRegressor [107], and CatBoostRegressor [108].

4. Results and Discussions

All computations in this study were performed on a computer running Windows 11 (version 23V2). The system featured an Intel i7-10870H processor at 2.20GHz, 64 GB of RAM, and an NVIDIA RTX 2070 GPU with 8GB of GDDR6 memory (256-bit). Jupyter Notebook (Version 7.4.5) was employed as the integrated development environment for Python, a widely used language for statistical analysis, data processing, and producing high-quality visualizations [109].
R2 and RMSSE were employed as evaluation metrics in this study. The coefficient of determination R2 is given by
R 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ ) 2 .
In this expression, i represents the index running over observations from 1 to n, y i represents the actual value for the i-th observation, y ^ i represents the predicted value for the i-th observation, y ¯ represents the mean of all y i , and n represents the total number of observations [110].
RMSSE is defined as
R M S S E = 1 h i = n + 1 n + h ( y i y ^ i ) 2 1 n 1 i = 2 n ( y i y i 1 ) 2
Here, the index i continues through time from 1 to n + h , where n is the last time point in the training set and h is the forecast horizon. The term 1 h i = n + 1 n + h ( y i y ^ i ) 2 measures the average squared errors of the predictions from i = n + 1 to i = n + h , while the term 1 n 1 i = 2 n ( y i y i 1 ) 2 represents the average of the squared differences between consecutive actual values in the training period [111].
For model testing and evaluation, a random sampling method was applied to XGBoost, LightGBM, and CatBoost to select 80% of the data for training and 20% for testing [112].
This paper focused on the optimization of two crucial hyperparameters, the number of trees and learning rate ( η ), to improve the three distinct boosting methods’ performance and generalization capabilities. Because these two factors directly influence model complexity and learning dynamics, selecting the right combinations can lead to substantial performance gains while mitigating risks such as overfitting or underfitting. Tree sizes were changed from 100 to 1000 by 100 to capture various degrees of model expressiveness and computational costs.
This paper also examined learning rates between 0.05 and 0.35 by 0.05 to recognize that smaller values generally promote stable learning but require more iterations. In contrast, larger values can accelerate training yet increase the likelihood of overshooting during optimization. To comprehensively assess model performance, R2 and RMSSE during the prediction process were calculated to compare the obtained results.
As seen from Figure 8, Figure 9 and Figure 10; those three tree-based ML models consistently showed that adding more trees—typically up to around 700–800—drove strong gains in R2 and lowered RMSSE, reflecting an improved balance between underfitting and overfitting. Beyond that range, improvements in predictive performance began to plateau, highlighting diminishing returns in exchanging training time for slight increases in model accuracy.
The learning rate ( η ) likewise played a pivotal role, with smaller values in the 0.05–0.10 range delivering more stable and ultimately higher R2 at the cost of longer training times. Higher rates, usually 0.20–0.25 or above, converged more quickly but risked overshooting the loss surface and ending up in suboptimal regimes, resulting in flatter or even declining gains in R2.
Among the three models tested, LightGBM emerged as the fastest, requiring only 1–2.5 s to reach 700–1000 trees, all while achieving the top R2 (around 97.0–97.1%) and the lowest RMSSE (about 0.120–0.121). CatBoost and XGBoost tended to take longer (up to 4–5 s), yet still attained competitive R2 scores near 96.8–96.9% and RMSSE around 0.122–0.127. Overall, these results underscore that choosing a moderate learning rate alongside 700–800 trees typically strikes the best compromise between model accuracy and computational cost, although the final decision should be guided by the specific time and performance needs of the application.
Table 4 presents the top five results sorted by RMSSE for each model: LightGBM, CatBoost, and XGBoost. Among the three, LightGBM achieves the best overall performance with an RMSSE of 0.1217 and an R2 of 97.07%, using 1000 trees and a learning rate of 0.10. The next four LightGBM configurations, ranging from 900 to 600 trees with the same learning rate, maintain low RMSSE values while exhibiting a gradual decline in performance.
CatBoost’s top configurations start in the sixth position overall, with its best result (1000 trees and a learning rate of 0.15) achieving an RMSSE of 0.1242 and an R2 of 96.94%. The remaining top CatBoost configurations show slightly higher RMSSE values but stay competitive with LightGBM, showcasing the importance of its hyperparameter tuning.
XGBoost’s top results rank lower, starting at the 11th position. Its best configuration (900 trees and a learning rate of 0.15) achieves an RMSSE of 0.1273 and an R2 of 96.79%. While its performance is respectable, XGBoost lags behind both LightGBM and CatBoost in this comparison. The remaining XGBoost configurations exhibit similar RMSSE values but fail to match the performance of the other two models.
These results emphasize the importance of hyperparameter optimization in determining model performance. LightGBM emerges as the most effective model in this evaluation, particularly for minimizing RMSSE.
From the results in Table 4, the configuration with a learning rate of 0.10 and 1000 trees was selected for each model (LightGBM, CatBoost, and XGBoost). The predictions generated by these models are plotted against the actual energy values for the period between 21 April 2022 and 25 April 2022 as shown in Figure 11.
In Figure 11, all three models closely follow the actual energy values, demonstrating their ability to capture the general trends in the data set. However, there are slight deviations in specific regions where the energy values exhibit sharp changes. LightGBM predictions align almost perfectly with the actual data across the time range. Its ability to accurately capture both stable and transitional periods makes it the most robust model in this comparison.
CatBoost predictions also closely follow the actual values but show small deviations in certain periods, particularly during sharp transitions. This could indicate slight sensitivity to rapid changes in the data.
XGBoost predictions demonstrate good performance but exhibit slightly larger deviations during sharp transitions compared to LightGBM and CatBoost. This is consistent with its ranking in terms of RMSSE in Table 4.
LightGBM emerges as the most accurate model based on the visual analysis of predictions and its superior RMSSE performance. CatBoost and XGBoost also perform well but are slightly less precise during transitions. This highlights the importance of hyperparameter optimization and model selection in improving prediction accuracy for time-series data.
Although these results are specific to the Ceyhan River basin, the proposed framework is not inherently site-dependent. Because it integrates hydrological, meteorological, and market variables, it has potential applicability to other river systems. However, transferring the framework to a new basin would require retraining the model with basin-specific data, as well as independent testing and validation to ensure robustness under different hydrological regimes. Performance may vary depending on the basin characteristics, and additional cross-basin experiments would be necessary to confirm its generalizability.
Another notable aspect of the proposed approach is real-time forecasting. The boosting models employed allow inference so forecasts may be updated rapidly as new process, meteorological, or market data is available. Retraining on a rolling window and drift monitoring would further ensure forecast accuracy maintained in an operational setting without significant computational time. Libraries such as XGBoost and LightGBM allow for incremental updates, enabling the model to incorporate recent data quickly as periodic full retraining maintains long-term stability.

5. Conclusions

Given the increasing role of hydroelectricity in achieving sustainable energy goals, this study benchmarks advanced tree-based machine learning models—XGBoost, LightGBM, and CatBoost—to enhance HGF accuracy. Unlike traditional methods, the proposed approach incorporates basin-wide hydrological and meteorological data, offering a comprehensive view of the factors influencing energy production.
The findings highlight the robustness of the selected models in managing the complexities of forecasting in a dynamic environment. Evaluated through R2 and RMSSE metrics, the results demonstrate the models’ ability to deliver accurate and reliable predictions. Among the three, LightGBM emerged as the most accurate model, achieving the lowest RMSSE (0.1217) and the highest R2 (97.07%), followed closely by CatBoost and XGBoost. The predictions plotted against actual energy values show all models effectively capturing overall trends, though minor deviations are observed during sharp transitions, with LightGBM consistently outperforming the other models in both accuracy and stability.
This study is considered to represent a significant contribution to the field, being among the first to integrate upstream hydrological data into hydroelectricity forecasting. The proposed framework provides a strong foundation for optimizing energy dispatch, improving water resource management, and maintaining grid stability. Additionally, the model framework can be applied to other basins with similar operational characteristics, advancing both theoretical and practical applications in sustainable energy management.
The insights gained pave the way for further exploration of hybrid methodologies, real-time implementation strategies, and the integration of additional data sources to enhance predictive accuracy and operational efficiency.
It is important to know the potential risks of over-reliance on ML-based forecasting in hydroelectric dispatching. Forecast errors during extreme inflow events may cause operational issues such as spillage, inefficient reservoir management, or ecological flow violations. To mitigate these risks, the forecasting models should be applied as a decision support tool, not a fully automated dispatch system, complemented by rule curves, operational constraints, and prediction intervals. Incorporating hybrid approaches that combine machine learning with hydrological knowledge represents a promising direction for future work.

Author Contributions

Conceptualization, B.A.A. and K.Z.; Data Curation: B.A.A.; Formal Analysis: B.A.A. and K.Z.; Investigation: B.A.A.; Methodology: B.A.A. and K.Z.; Project Administration: K.Z.; Resources: B.A.A. and K.Z.; Software: B.A.A. and K.Z.; Supervision: K.Z.; Validation: B.A.A.; Visualization: B.A.A. and K.Z.; Writing—Original Draft: B.A.A.; and Writing—Review and Editing: K.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The authors of this paper would like to bring the term reproducibility to the fore by using Python programming language on a publicly available data set. The data set can be accessed by sending an e-mail to the corresponding author.

Acknowledgments

The authors are grateful to and thank the anonymous reviewers for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ABCArtificial Bee Colony
ABDTAdaptive Boosting Decision Trees
ABLRAdaptive Boosting Linear Regression
AEAutoencoder
AIArtificial Intelligence
ANFISAdaptive Neuro-Fuzzy Inference System
ANNArtificial Neural Network
ARIMAAuto-Regressive Integrated Moving Average
AWTAdaptive Wavelet Transform
CatBoostCategorical Boosting
DLDeep Learning
DNNDeep Neural Network
EEMDEnsemble Empirical Mode Decomposition
ELMExtreme Learning Machines
EXISTThe Energy Exchange Istanbul
GAGenetic Algorithm
GBDTGradient Boosted Decision Trees
GBMGradient Boosting Machine
GOSSGradient-based One Side Sampling
GPRGaussian Process Regression
GWOGrey Wolf Optimization
HGFHydroelectricity Generation Forecasting
HPPHydroelectric Power Plant
kNNK-Nearest Neighbor
LightGBMLight Gradient Boosting Machine
LSTMLong Short-Term Memory
LWNRBFLinear Weighted Normalized Radial Basis Function
MAEMean Absolute Error
MAPEMean Absolute Percentage Error
MERRA-2Modern-Era Retrospective Analysis for Research and Applications, Version 2
MLMachine Learning
MLPMultilayer Perceptron
MLRMultiple Linear Regression
MSEMean Squared Error
NSENash–Sutcliffe Efficiency
R2Coefficient of Determination
RBFRadial Basis Function
RFRandom Forest
RMSERoot Mean Squared Error
RMSPERoot Mean Squared Percentage Error
RMSSERoot Mean Squared Scaled Error
RNNRecurrent Neural Networks
SARIMASeasonal ARIMA
SVMSupport Vector Machine
SVRSupport Vector Regression
XGBoostExtreme Gradient Boosting
WDSWater Distribution Systems

References

  1. Çakır, S. Renewable energy generation forecasting in Turkey via intuitionistic fuzzy time series approach. Renew. Energy 2023, 214, 194–200. [Google Scholar] [CrossRef]
  2. Energy Institute. Statistical Review of World Energy 2024; Technical report; Energy Institute: London, UK, 2024. [Google Scholar]
  3. Cebeci, C.; Parker, M.; Recalde-Camacho, L.; Campos-Gaona, D.; Anaya-Lara, O. Variable-Speed Hydropower Control and Ancillary Services: A Remedy for Enhancing Grid Stability and Flexibility. Energies 2025, 18, 642. [Google Scholar] [CrossRef]
  4. Zor, K.; Tolun, G.G.; Şeker Zor, E. Forecasting Electricity Generation of a Geothermal Power Plant Using LSTM and GRU Networks. In Proceedings of the 2025 7th Global Power, Energy and Communication Conference (GPECOM), Bochum, Germany, 11–13 June 2025; pp. 531–536. [Google Scholar] [CrossRef]
  5. International Hydropower Association. 2024 World Hydropower Outlook; Technical report; International Hydropower Association: London, UK, 2024. [Google Scholar]
  6. Bayazıt, Y. The effect of hydroelectric power plants on the carbon emission: An example of Gokcekaya dam, Turkey. Renew. Energy 2021, 170, 181–187. [Google Scholar] [CrossRef]
  7. Rahman, A.; Farrok, O.; Haque, M.M. Environmental impact of renewable energy source based electrical power plants: Solar, wind, hydroelectric, biomass, geothermal, tidal, ocean, and osmotic. Renew. Sustain. Energy Rev. 2022, 161, 112279. [Google Scholar] [CrossRef]
  8. Atalay, B.A. Hydroelectric Power Forecasting via Tree-Based Machine Learning Algorithms. Ph.D. Thesis, Department of Electrical and Electronic Engineering, Graduate School, Adana Alparslan Türkeş Science and Technology University, Adana, Türkiye, 2024. Available online: https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=1pwTzRXnomYf6jwqVORfUU5c3WKK1Ha5zeoRJpvV87EpgvwsiUzpV629p6yCgy4n (accessed on 23 September 2025).
  9. Lorca, A.; Favereau, M.; Olivares, D. Challenges in the Management of Hydroelectric Generation in Power System Operations. Curr. Sustain. Energy Rep. 2020, 7, 94–99. [Google Scholar] [CrossRef]
  10. Xiao, J.W.; Fang, H.; Wang, Y.W. Short-Term Residential Load Forecasting via Pooling-Ensemble Model With Smoothing Clustering. IEEE Trans. Artif. Intell. 2024, 5, 3690–3702. [Google Scholar] [CrossRef]
  11. Li, P.C.; Wang, Y.W.; Xiao, J.W. Accurate forecasting on few-shot learning with a novel inference foundation model. Inf. Fusion 2025, 124, 103370. [Google Scholar] [CrossRef]
  12. Kuriqi, A.; Pinheiro, A.N.; Sordo-Ward, A.; Bejarano, M.D.; Garrote, L. Ecological impacts of run-of-river hydropower plants—Current status and future prospects on the brink of energy transition. Renew. Sustain. Energy Rev. 2021, 142, 110833. [Google Scholar] [CrossRef]
  13. Krechowicz, A.; Krechowicz, M.; Poczeta, K. Machine Learning Approaches to Predict Electricity Production from Renewable Energy Sources. Energies 2022, 15, 9146. [Google Scholar] [CrossRef]
  14. EÜAŞ Aslantaş HPP. 2024. Available online: https://www.euas.gov.tr/en-US/power-plants/aslantas-hepp (accessed on 23 September 2025).
  15. Enerjiatlasi.com HPPs in Türkiye. 2025. Available online: https://www.enerjiatlasi.com/hidroelektrik/ (accessed on 23 September 2025).
  16. Mosavi, A.; Salimi, M.; Faizollahzadeh Ardabili, S.; Rabczuk, T.; Shamshirband, S.; Varkonyi-Koczy, A.R. State of the Art of Machine Learning Models in Energy Systems, a Systematic Review. Energies 2019, 12, 1301. [Google Scholar] [CrossRef]
  17. Ferrero Bermejo, J.; Gómez Fernández, J.F.; Olivencia Polo, F.; Crespo Márquez, A. A Review of the Use of Artificial Neural Network Models for Energy and Reliability Prediction. A Study of the Solar PV, Hydraulic and Wind Energy Sources. Appl. Sci. 2019, 9, 1844. [Google Scholar] [CrossRef]
  18. Wang, H.; Lei, Z.; Zhang, X.; Zhou, B.; Peng, J. A review of deep learning for renewable energy forecasting. Energy Convers. Manag. 2019, 198, 111799. [Google Scholar] [CrossRef]
  19. Ying, C.; Wang, W.; Yu, J.; Li, Q.; Yu, D.; Liu, J. Deep learning for renewable energy forecasting: A taxonomy, and systematic literature review. J. Clean. Prod. 2023, 384, 135414. [Google Scholar] [CrossRef]
  20. Aleksandrovskii, A.Y.; Borshch, P.S. Prediction of electric-power generation at hydroelectric power plants. Power Technol. Eng. 2013, 47, 83–88. [Google Scholar] [CrossRef]
  21. Lima, C.H.; Lall, U. Climate informed long term seasonal forecasts of hydroenergy inflow for the Brazilian hydropower system. J. Hydrol. 2010, 381, 65–75. [Google Scholar] [CrossRef]
  22. Ekanayake, P.; Wickramasinghe, L.; Jayasinghe, J.M.J.W.; Rathnayake, U. Regression-Based Prediction of Power Generation at Samanalawewa Hydropower Plant in Sri Lanka Using Machine Learning. Math. Probl. Eng. 2021, 2021, 4913824. [Google Scholar] [CrossRef]
  23. Barzola-Monteses, J.; Mite-León, M.; Espinoza-Andaluz, M.; Gómez-Romero, J.; Fajardo, W. Time Series Analysis for Predicting Hydroelectric Power Production: The Ecuador Case. Sustainability 2019, 11, 6539. [Google Scholar] [CrossRef]
  24. Lei, Y.; Xue, P.; Li, Y. Comparison of Holt-Winters and ARIMA Models for Hydropower Forecasting in Guangxi. In Proceedings of the 2020 3rd International Conference on Signal Processing and Machine Learning, Beijing China, 22–24 October 2020; pp. 63–67. [Google Scholar] [CrossRef]
  25. de Sousa, M.A.; Maçaira, P.M.; Souza, R.C.; Cyrino Oliveira, F.L.; Calili, R.F. Forecasting Electricity Generation of Small Hydropower Plants. In Springer Proceedings in Business and Economics; Springer Science and Business Media B.V.: Cham, Switzerland, 2020; pp. 45–54. [Google Scholar] [CrossRef]
  26. Polprasert, J.; Hanh Nguyen, V.A.; Nathanael Charoensook, S. Forecasting Models for Hydropower Production Using ARIMA Method. In Proceedings of the 2021 9th International Electrical Engineering Congress (iEECON), Pattaya, Thailand, 10–12 March 2021; pp. 197–200. [Google Scholar] [CrossRef]
  27. Abu, N.; Tukimat, N.N.A.; Abu, N. Forecasting of hydropower production using Box-Jenkins model at Tasik Kenyir, Terengganu. AIP Conf. Proc. 2024, 2895, 050005. [Google Scholar] [CrossRef]
  28. Sarpong, S.A.; Agyei, A. Forecasting Hydropower Generation in Ghana Using ARIMA Models. Int. J. Stat. Probab. 2022, 11, 30. [Google Scholar] [CrossRef]
  29. Shoaga, G.O.; Ikuzwe, A.; Gupta, A. Forecasting of Monthly Hydroelectric and Solar Energy in Rwanda using SARIMA. In Proceedings of the 2022 IEEE PES/IAS PowerAfrica, PowerAfrica 2022, Kigali, Rwanda, 22–26 August 2022. [Google Scholar] [CrossRef]
  30. Cheng, C.T.; Miao, S.M.; Luo, B.; Sun, Y.J. Forecasting monthly energy production of small hydropower plants in ungauged basins using grey model and improved seasonal index. J. Hydroinform. 2017, 19, 993–1008. [Google Scholar] [CrossRef]
  31. Wang, Z.X.; Li, Q.; Pei, L.L. Grey forecasting method of quarterly hydropower production in China based on a data grouping approach. Appl. Math. Model. 2017, 51, 302–316. [Google Scholar] [CrossRef]
  32. Li, Z.; Hu, X.; Guo, H.; Xiong, X. A novel Weighted Average Weakening Buffer Operator based Fractional order accumulation Seasonal Grouping Grey Model for predicting the hydropower generation. Energy 2023, 277, 127568. [Google Scholar] [CrossRef]
  33. Monteiro, C.; Ramirez-Rosado, I.J.; Fernandez-Jimenez, L.A. Short-term forecasting model for aggregated regional hydropower generation. Energy Convers. Manag. 2014, 88, 231–238. [Google Scholar] [CrossRef]
  34. Coulibaly, P.; Anctil, F.; Bobée, B. Neural Network-Based Long-Term Hydropower Forecasting System. Comput.-Aided Civ. Infrastruct. Eng. 2000, 15, 355–364. [Google Scholar] [CrossRef]
  35. Valena, M.; Ludermir, T. Constructive neural networks in forecasting weekly river flow. In Proceedings of the Proceedings Fourth International Conference on Computational Intelligence and Multimedia Applications, ICCIMA 2001, Yokusika City, Japan, 30 October–1 November 2001; pp. 271–275. [Google Scholar] [CrossRef]
  36. Stokelj, T.; Paravan, D.; Golob, R. Enhanced Artificial Neural Network Inflow Forecasting Algorithm for Run-of-River Hydropower Plants. J. Water Resour. Plan. Manag. 2002, 128, 415–423. [Google Scholar] [CrossRef]
  37. Joaquim, P.; Rosa, J. Artificial neural networks for temporal processing applied to prediction of electric energy in small hydroelectric power stations. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada, 31 July–4 August 2005; Volume 4, pp. 2625–2630. [Google Scholar] [CrossRef]
  38. Cobaner, M.; Haktanir, T.; Kisi, O. Prediction of Hydropower Energy Using ANN for the Feasibility of Hydropower Plant Installation to an Existing Irrigation Dam. Water Resour. Manag. 2008, 22, 757–774. [Google Scholar] [CrossRef]
  39. Uzlu, E.; Akpinar, A.; Özturk, H.T.; Nacar, S.; Kankal, M. Estimates of hydroelectric generation using neural networks with the artificial bee colony algorithm for Turkey. Energy 2014, 69, 638–647. [Google Scholar] [CrossRef]
  40. Li, M.; Deng, C.H.; Tan, J.; Yang, W.; Zheng, L. Research on Small Hydropower Generation Forecasting Method Based on Improved BP Neural Network. In Proceedings of the 2016 3rd International Conference on Materials Engineering, Manufacturing Technology and Control, Taiyuan, China, 27–28 February 2016. [Google Scholar] [CrossRef]
  41. Hussin, S.N.H.S.; Malek, M.A.; Jaddi, N.S.; Hamid, Z.A. Hybrid metaheuristic of artificial neural network—Bat algorithm in forecasting electricity production and water consumption at Sultan Azlan shah Hydropower plant. In Proceedings of the 2016 IEEE International Conference on Power and Energy (PECon), Melaka, Malaysia, 28–29 November 2016; pp. 28–31. [Google Scholar] [CrossRef]
  42. Hammid, A.T.; Sulaiman, M.H.B.; Awad, O.I. A robust firefly algorithm with backpropagation neural networks for solving hydrogeneration prediction. Electr. Eng. 2018, 100, 2617–2633. [Google Scholar] [CrossRef]
  43. Hammid, A.T.; Sulaiman, M.H.B.; Abdalla, A.N. Prediction of small hydropower plant power production in Himreen Lake dam (HLD) using artificial neural network. Alex. Eng. J. 2018, 57, 211–221. [Google Scholar] [CrossRef]
  44. Tucci, C.E.M.; Collischonn, W.; Fan, F.M.; Schwanenberg, D. Hydropower Forecasting in Brazil. In Handbook of Hydrometeorological Ensemble Forecasting; Springer: Berlin/Heidelberg, Germany, 2019; pp. 1307–1328. [Google Scholar] [CrossRef]
  45. Li, L.; Yao, F.; Huang, Y.; Zhou, F. Hydropower generation forecasting via deep neural network. In Proceedings of the 2019 6th International Conference on Information Science and Control Engineering, ICISCE 2019, Shanghai, China, 20–22 December 2019; pp. 324–328. [Google Scholar] [CrossRef]
  46. Lian, C.; Wei, H.; Qin, S.; Li, Z. Trend-guided Small Hydropower System Power Prediction Based on Extreme Learning Machine. In Proceedings of the 2020 4th International Conference on Power and Energy Engineering (ICPEE), Xiamen, China, 19–21 November 2020; pp. 220–225. [Google Scholar] [CrossRef]
  47. Lopes, M.N.G.; da Rocha, B.R.P.; Vieira, A.C.; de Sá, J.A.S.; Rolim, P.A.M.; da Silva, A.G. Artificial neural networks approaches for predicting the potential for hydropower generation: A case study for Amazon region. J. Intell. Fuzzy Syst. 2019, 36, 5757–5772. [Google Scholar] [CrossRef]
  48. Joe, W.W.; Yuzainee, M.Y.; Zaini, N.; Malek, M.A. Methods in Forecasting Water Used and Electricity Production at Hydropower Plants. Int. J. Recent Technol. Eng. (IJRTE) 2019, 8, 6499–6505. [Google Scholar] [CrossRef]
  49. Ma, L.; Chen, S.; Wei, D.; Zhang, Y.; Guo, Y. A Comprehensive Hybrid Deep Learning Approach for Accurate Status Predicting of Hydropower Units. Appl. Sci. 2024, 14, 9323. [Google Scholar] [CrossRef]
  50. Zhang, G.; Li, H.; Wang, L.; Wang, W.; Guo, J.; Qin, H.; Ni, X. Research on Medium- and Long-Term Hydropower Generation Forecasting Method Based on LSTM and Transformer. Energies 2024, 17, 5707. [Google Scholar] [CrossRef]
  51. Di Grande, S.; Berlotti, M.; Cavalieri, S.; Gueli, R. A Machine Learning Approach for Hydroelectric Power Forecasting. In Proceedings of the 14th International Renewable Energy Congress (IREC), Sousse, Tunisia, 16–18 December 2023; pp. 1–6. [Google Scholar] [CrossRef]
  52. Di Grande, S.; Berlotti, M.; Cavalieri, S.; Gueli, R. A Machine Learning Approach to Forecasting Hydropower Generation. Energies 2024, 17, 5163. [Google Scholar] [CrossRef]
  53. Javed, U.; Fraz, M.M.; Mahmood, I.; Shahzad, M.; Arif, O. Forecasting of Electricity Generation for Hydro Power Plants. In Proceedings of the 2020 IEEE 17th International Conference on Smart Communities: Improving Quality of Life Using ICT, IoT and AI (HONET), Charlotte, NC, USA, 14–16 December 2020; pp. 32–36. [Google Scholar] [CrossRef]
  54. Al Rayess, H.; Ülke Keskin, A. Forecasting the hydroelectric power generation of gcms using machine learning techniques and deep learning (Almus dam, Turkey). Geofizika 2021, 38, 1–14. [Google Scholar] [CrossRef]
  55. Wang, B.; Li, T.; Xu, N.; Zhou, H.; Xiong, Z.; Long, W. A Novel Reservoir Modeling Method based on Improved Hierarchical XGBoost. In Proceedings of the 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 12–14 March 2021; pp. 1918–1923. [Google Scholar] [CrossRef]
  56. Szczepanek, R. Daily Streamflow Forecasting in Mountainous Catchment Using XGBoost, LightGBM and CatBoost. Hydrology 2022, 9, 226. [Google Scholar] [CrossRef]
  57. Kumar, V.; Kedam, N.; Sharma, K.V.; Mehta, D.J.; Caloiero, T. Advanced Machine Learning Techniques to Improve Hydrological Prediction: A Comparative Analysis of Streamflow Prediction Models. Water 2023, 15, 2572. [Google Scholar] [CrossRef]
  58. Wu, Y.; Xie, Y.; Xu, F.; Zhu, X.; Liu, S. A runoff-based hydroelectricity prediction method based on meteorological similar days and XGBoost model. Front. Energy Res. 2023, 11, 1273805. [Google Scholar] [CrossRef]
  59. Maciejewski, D.; Mudryk, K.; Sporysz, M. Forecasting Electricity Production in a Small Hydropower Plant (SHP) Using Artificial Intelligence (AI). Energies 2024, 17, 6401. [Google Scholar] [CrossRef]
  60. Atalay, B.A.; Zor, K. XGBoost (Aşırı Gradyan Artırımlı Karar Ağaçları) ile Hidroelektrik Enerji Tahmini. Çukurova Üniv. Mühendis. Fak. Derg. 2025, 40, 205–218. [Google Scholar] [CrossRef]
  61. Qi, Z.; Feng, Y.; Wang, S.; Li, C. Enhancing hydropower generation Predictions: A comprehensive study of XGBoost and Support Vector Regression models with advanced optimization techniques. Ain Shams Eng. J. 2025, 16, 103206. [Google Scholar] [CrossRef]
  62. Wang, C.; Li, C.; Feng, Y.; Wang, S. Predicting hydropower generation: A comparative analysis of Machine learning models and optimization algorithms for enhanced forecasting accuracy and operational efficiency. Ain Shams Eng. J. 2025, 16, 103299. [Google Scholar] [CrossRef]
  63. Luna, I.; Lopes, J.E.G.; Ballini, R.; Soares, S. Verifying the Use of Evolving Fuzzy Systems for Multi-Step Ahead Daily Inflow Forecasting. In Proceedings of the 15th International Conference on Intelligent System Applications to Power Systems, Curitiba, Brazil, 8–12 November 2009; pp. 1–6. [Google Scholar] [CrossRef]
  64. Konica, J.A.; Staka, E. Forecasting of a hydropower plant energy production with Fuzzy logic Case for Albania. J. Multidiscip. Eng. Sci. Technol. (JMEST) 2017, 4, 2458–9403. [Google Scholar]
  65. Wardlaw, R.B.; Sharif, M.; Kimaite, F. Real-time hydro-power forecasting on the Victoria Nile. Proc. Inst. Civ. Eng.-Water Manag. 2005, 158, 45–54. [Google Scholar] [CrossRef]
  66. Wang, W.; Xu, D.; Qiu, L.; Ma, J. Genetic Programming for Modelling Long-Term Hydrological Time Series. In Proceedings of the 2009 Fifth International Conference on Natural Computation, Tianjian, China, 14–16 August 2009; Volume 4, pp. 265–269. [Google Scholar] [CrossRef]
  67. Dehghani, M.; Riahi-Madvar, H.; Hooshyaripor, F.; Mosavi, A.; Shamshirband, S.; Zavadskas, E.K.; Chau, K.w. Prediction of Hydropower Generation Using Grey Wolf Optimization Adaptive Neuro-Fuzzy Inference System. Energies 2019, 12, 289. [Google Scholar] [CrossRef]
  68. Rathnayake, N.; Rathnayake, U.; Dang, T.L.; Hoshino, Y. A Cascaded Adaptive Network-Based Fuzzy Inference System for Hydropower Forecasting. Sensors 2022, 22, 2905. [Google Scholar] [CrossRef] [PubMed]
  69. Zhou, S.; Wang, Y.; Su, H.; Chang, J.; Huang, Q.; Li, Z. Dynamic quantitative assessment of multiple uncertainty sources in future hydropower generation prediction of cascade reservoirs with hydrological variations. Energy 2024, 299, 131447. [Google Scholar] [CrossRef]
  70. Hao, R.; Bai, Z. Comparative Study for Daily Streamflow Simulation with Different Machine Learning Methods. Water 2023, 15, 1179. [Google Scholar] [CrossRef]
  71. Tebong, N.K.; Simo, T.; Takougang, A.N. Two-level deep learning ensemble model for forecasting hydroelectricity production. Energy Rep. 2023, 10, 2793–2803. [Google Scholar] [CrossRef]
  72. Çakıcı, F.N.; Tezcan, S.S.; Düzkaya, H. Estimation of Hydroelectric Power Generation Forecasting and Analysis of Climate Factors with Deep Learning Methods: A Case Study in Yozgat Province in Turkey. Gazi Üniv. Fen Bilim. Derg. Part C Tasarım Teknol. 2024, 12, 819–831. [Google Scholar] [CrossRef]
  73. Aksoy, B. Estimation of Energy Produced in Hydroelectric Power Plant Industrial Automation Using Deep Learning and Hybrid Machine Learning Techniques. Electr. Power Components Syst. 2021, 49, 213–232. [Google Scholar] [CrossRef]
  74. Wang, J.; Gao, Z.; Ma, Y. Prediction Model of Hydropower Generation and Its Economic Benefits Based on EEMD-ADAM-GRU Fusion Model. Water 2022, 14, 3896. [Google Scholar] [CrossRef]
  75. Zolfaghari, M.; Golabi, M.R. Modeling and predicting the electricity production in hydropower using conjunction of wavelet transform, long short-term memory and random forest models. Renew. Energy 2021, 170, 1367–1381. [Google Scholar] [CrossRef]
  76. Ogliari, E.; Nespoli, A.; Mussetta, M.; Pretto, S.; Zimbardo, A.; Bonfanti, N.; Aufiero, M. A Hybrid Method for the Run-Of-The-River Hydroelectric Power Plant Energy Forecast: HYPE Hydrological Model and Neural Network. Forecasting 2020, 2, 410–428. [Google Scholar] [CrossRef]
  77. Huangpeng, Q.; Huang, W.; Gholinia, F. Forecast of the hydropower generation under influence of climate change based on RCPs and Developed Crow Search Optimization Algorithm. Energy Rep. 2021, 7, 385–397. [Google Scholar] [CrossRef]
  78. Yildiz, C.; Açikgöz, H. Forecasting diversion type hydropower plant generations using an artificial bee colony based extreme learning machine method. Energy Sources Part B Econ. Plan. Policy 2021, 16, 216–234. [Google Scholar] [CrossRef]
  79. Kongpaseuth, V.; Kaewarsa, S. Nam Theun 2 Hydropower Plant Energy Prediction Using Artificial Neural Network and Genetic Algorithm. In Proceedings of the Asia-Pacific Power and Energy Engineering Conference, APPEEC, Chiang Mai, Thailand, 6–9 December 2023. [Google Scholar] [CrossRef]
  80. Kaewarsa, S.; Kongpaseuth, V. An energy prediction approach using bi-directional long short-term memory for a hydropower plant in Laos. Electr. Eng. 2024, 106, 2609–2625. [Google Scholar] [CrossRef]
  81. Prakash, S.A.; Shah, D.; Jayavel, K.; Mtonga, K. Hydropower Energy Generation Prediction Model: A Machine Learning Approch. In Proceedings of the 2022 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 25–27 January 2022; pp. 01–04. [Google Scholar] [CrossRef]
  82. Bilgili, M.; Keiyinci, S.; Ekinci, F. One-day ahead forecasting of energy production from run-of-river hydroelectric power plants with a deep learning approach. Sci. Iran. 2022, 29, 1838–1852. [Google Scholar] [CrossRef]
  83. İnal, S.; Akkaya Oy, S.; Özdemir, A.E. A Neural Network Model for Estimation of Maximum Next Day Energy Generation Capacity of a Hydropower Station: A Case Study from Turkey. Celal Bayar Üniv. Fen Bilim. Derg. 2023, 19, 197–204. [Google Scholar] [CrossRef]
  84. Safaraliev, M.; Kiryanova, N.; Matrenin, P.; Dmitriev, S.; Kokin, S.; Kamalov, F. Medium-term forecasting of power generation by hydropower plants in isolated power systems under climate change. Energy Rep. 2022, 8, 765–774. [Google Scholar] [CrossRef]
  85. Global Modeling and Assimilation Office (GMAO). MERRA-2 inst1_2d_asm_Nx: 2d,1-Hourly,Instantaneous,Single-Level,Assimilation,Single-Level Diagnostics V5.12.4 (M2I1NXASM); Global Modeling and Assimilation Office (GMAO): Greenbelt, MD, USA, 2015. [Google Scholar] [CrossRef]
  86. Global Modeling and Assimilation Office (GMAO). MERRA-2 tavg1_2d_flx_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Surface Flux Diagnostics V5.12.4 (M2T1NXFLX); Global Modeling and Assimilation Office (GMAO): Greenbelt, MD, USA, 2015. [Google Scholar] [CrossRef]
  87. Global Modeling and Assimilation Office (GMAO). MERRA-2 tavg1_2d_rad_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Radiation Diagnostics V5.12.4 (M2T1NXRAD); Global Modeling and Assimilation Office (GMAO): Greenbelt, MD, USA, 2015. [Google Scholar] [CrossRef]
  88. EXIST(EPIAS). Transparency Platform. 2025. Available online: https://seffaflik.epias.com.tr/home (accessed on 23 September 2025).
  89. Wilcox, R.R. Chapter 9 - Correlation and Tests of Independence. In Introduction to Robust Estimation and Hypothesis Testing, 5th ed.; Wilcox, R.R., Ed.; Academic Press: New York, NY, USA, 2022; pp. 541–575. [Google Scholar] [CrossRef]
  90. Patil, I. Visualizations with statistical details: The “ggstatsplot” approach. J. Open Source Softw. 2021, 6, 3167. [Google Scholar] [CrossRef]
  91. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
  92. Tolun, O.C.; Zor, K.; Tutsoy, O. A comprehensive benchmark of machine learning-based algorithms for medium-term electric vehicle charging demand prediction. J. Supercomput. 2025, 81, 475. [Google Scholar] [CrossRef]
  93. Tolun, G.G.; Tolun, O.C.; Zor, K. An Application of Prosumer Electric Load Forecasting with Machine Learning-Based Algorithms. In Proceedings of the 2024 15th National Conference on Electrical and Electronics Engineering (ELECO), Bursa, Türkiye, 28–30 November 2024; pp. 1–6. [Google Scholar] [CrossRef]
  94. Distributed (Deep) Machine Learning Community. DMLC XGBoost. Available online: https://github.com/dmlc/xgboost (accessed on 23 September 2025).
  95. Mitchell, R.; Adinets, A.; Rao, T.; Frank, E. XGBoost: Scalable GPU Accelerated Learning. arXiv 2018, arXiv:1806.11248. [Google Scholar]
  96. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
  97. Tolun, G.G.; Tolun, Ö.C.; Zor, K. Advanced machine learning algorithms for reactive power forecasting in electric distribution systems. E-Prime-Adv. Electr. Eng. Electron. Energy 2025, 13, 101019. [Google Scholar] [CrossRef]
  98. Lu, Y.; Wang, L.; Zhu, C.; Zou, L.; Zhang, M.; Feng, L.; Cao, Q. Predicting surface solar radiation using a hybrid radiative Transfer–Machine learning model. Renew. Sustain. Energy Rev. 2023, 173, 113105. [Google Scholar] [CrossRef]
  99. Wang, R.; Liu, Y.; Ye, X.; Tang, Q.; Gou, J.; Huang, M.; Wen, Y. Power System Transient Stability Assessment Based on Bayesian Optimized LightGBM. In Proceedings of the 2019 3rd IEEE Conference on Energy Internet and Energy System Integration: Ubiquitous Energy Network Connecting Everything, EI2 2019, Changsha, China, 8–10 November 2019; pp. 263–268. [Google Scholar] [CrossRef]
  100. Fan, L.; Wang, Y.; Fang, X.; Jiang, J. To Predict the Power Generation based on Machine Learning Method. J. Phys. Conf. Ser. 2022, 2310, 012084. [Google Scholar] [CrossRef]
  101. Wang, L.; Lu, Y.; Wang, Z.; Li, H.; Zhang, M. Hourly solar radiation estimation and uncertainty quantification using hybrid models. Renew. Sustain. Energy Rev. 2024, 202, 114727. [Google Scholar] [CrossRef]
  102. Xiong, X.; Hu, X.; Tian, T.; Guo, H.; Liao, H. A novel Optimized initial condition and Seasonal division based Grey Seasonal Variation Index model for hydropower generation. Appl. Energy 2022, 328, 120180. [Google Scholar] [CrossRef]
  103. Adinkrah, J.; Kemausuor, F.; Tutu Tchao, E.; Nunoo-Mensah, H.; Agbemenu, A.S.; Adu-Poku, A.; Kponyo, J.J. Artificial intelligence-based strategies for sustainable energy planning and electricity demand estimation: A systematic review. Renew. Sustain. Energy Rev. 2025, 210, 115161. [Google Scholar] [CrossRef]
  104. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the Advances in Neural Information Processing Systems; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2018; Volume 31. [Google Scholar]
  105. Pan, Y.; Zhang, L. Data-driven estimation of building energy consumption with multi-source heterogeneous data. Appl. Energy 2020, 268, 114965. [Google Scholar] [CrossRef]
  106. Vasina, D.; Gorshenin, A. Application of the Catboost Gradient Boosting Method in Forecasting Solar Electricity. In Proceedings of the 2023 Dynamics of Systems, Mechanisms and Machines (Dynamics), Omsk, Russia, 14–15 November 2023; pp. 1–5. [Google Scholar] [CrossRef]
  107. Microsoft. GitHub-Microsoft/LightGBM: A Fast, Distributed, High Performance Gradient Boosting (GBT, GBDT, GBRT, GBM or MART) Framework Based on Decision Tree Algorithms, Used for Ranking, Classification and Many other Machine Learning Tasks. 2024. Available online: https://github.com/Microsoft/LightGBM (accessed on 23 September 2025).
  108. CatBoost. CatBoost-Open-Source Gradient Boosting Library. 2025. Available online: https://catboost.ai/ (accessed on 23 September 2025).
  109. What is Python Used For? 8 Real-Life Python Uses. 2024. Available online: https://www.datacamp.com/blog/what-is-python-used-for (accessed on 23 September 2025).
  110. Timur, O.; Zor, K.; Çelik, Ö.; Teke, A.; İbrikçi, T. Application of Statistical and Artificial Intelligence Techniques for Medium-Term Electrical Energy Forecasting: A Case Study for a Regional Hospital. J. Sustain. Dev. Energy Water Environ. Syst. 2020, 8, 520–536. [Google Scholar] [CrossRef]
  111. Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. M5 accuracy competition: Results, findings, and conclusions. Int. J. Forecast. 2022, 38, 1346–1364. [Google Scholar] [CrossRef]
  112. Cebeci, C.; Zor, K. Electricity Demand Forecasting Using Deep Polynomial Neural Networks and Gene Expression Programming During COVID-19 Pandemic. Appl. Sci. 2025, 15, 2843. [Google Scholar] [CrossRef]
Figure 1. Geographical representation of the HPPs in Table 1 (source: Google Earth).
Figure 1. Geographical representation of the HPPs in Table 1 (source: Google Earth).
Applsci 15 10514 g001
Figure 2. Hourly generated energy of Aslantaş HPP.
Figure 2. Hourly generated energy of Aslantaş HPP.
Applsci 15 10514 g002
Figure 3. Weekly aggregated generation data of all HPPs within the Ceyhan basin.
Figure 3. Weekly aggregated generation data of all HPPs within the Ceyhan basin.
Applsci 15 10514 g003
Figure 4. Winsorized Pearson’s correlation map of input variables [90].
Figure 4. Winsorized Pearson’s correlation map of input variables [90].
Applsci 15 10514 g004
Figure 5. Simplified flowchart of the applied methodology.
Figure 5. Simplified flowchart of the applied methodology.
Applsci 15 10514 g005
Figure 6. Illustration of the XGBoost concept [93].
Figure 6. Illustration of the XGBoost concept [93].
Applsci 15 10514 g006
Figure 7. Demonstration of the LightGBM concept [98].
Figure 7. Demonstration of the LightGBM concept [98].
Applsci 15 10514 g007
Figure 8. Visualization of hyperparameter tuning for XGBoost.
Figure 8. Visualization of hyperparameter tuning for XGBoost.
Applsci 15 10514 g008
Figure 9. Demonstration of hyperparameter tuning for LightGBM.
Figure 9. Demonstration of hyperparameter tuning for LightGBM.
Applsci 15 10514 g009
Figure 10. Illustration of hyperparameter tuning for CatBoost.
Figure 10. Illustration of hyperparameter tuning for CatBoost.
Applsci 15 10514 g010
Figure 11. Prediction comparison of models.
Figure 11. Prediction comparison of models.
Applsci 15 10514 g011
Table 1. HPPs installed on the Ceyhan River [15].
Table 1. HPPs installed on the Ceyhan River [15].
OwnerAltitudeInstalled PowerCF *
NumberNameStatus(m)(MW)(%)
1Dağdelen HPPPrivate11118.0037.7
2Kandil HPPPrivate1087207.9226.7
3Sarıgüzel HPPPrivate870103.0030.9
4Hacınınoğlu HPPPrivate749140.0025.4
5Menzelet HPPPrivate560124.0044.1
6Kılavuzlu HPPPrivate48954.0038.6
7Sır HPPState420283.5020.4
8Berke HPPState340510.0027.6
9Aslantaş HPPState145138.0036.6
* Capacity factor is selected as the highest value obtained between 2021 and 2023.
Table 2. Summary of studies on HPP modeling using ML-based methods.
Table 2. Summary of studies on HPP modeling using ML-based methods.
YearRef.LocationCapacityMethodsOutputMetrics
2020[53]Tarbela HPP, Pakistan4.88 MWMLR, kNN, SVR, RF, LSTMDaily2.47 kWh (MAE), 3.98 kWh (RMSE)
2021[54]Almus HPP, Türkiye27 MWDT, GBDT, RF, GLMonthly0.717 GBDT (Corr.)
2021[73]Dinar 2 HPP, Türkiye3 MWkNN, SVR, RF, GA, DNN, RNN, AEHourly1.904 kWh (MAE), 2.841 kWh (RMSE)
2021[75]Mahabad HPP, Iran6 MWAWT, LSTM, RFDaily2.154 kWh (MAE), 5.261 kWh (RMSE), 98.7% (R2)
2022[84]Gorno-Badakhshan HPPs, TajikistanN/ALR, kNN, ABDT, ABLR, RF, XGBoost, MLPDaily5.23% (MAPE)
2023[58]Yunnan, ChinaN/AXGBoost, GMQuarter Hourly97.14% (Acc.)
2024[59]Skawa HPP, Poland760 kWRF, GBDT, MLP, RBFDaily10.96 kWh (MAE), 3.41% (MAPE)
Table 3. Features of the data set.
Table 3. Features of the data set.
CategoryFeatureDescriptionUnits
EnergyEnergyLag1hHourly generation lagged by 1 hMWh
EnergyEnergyLag1dHourly generation lagged by 1 dayMWh
EnergyEnergyLag1wHourly generation lagged by 1 weekMWh
EnergyDağdelen HPPHourly generationMWh
EnergyKandil HPPHourly generationMWh
EnergySarıgüzel HPPHourly generationMWh
EnergyHacınınoğlu HPPHourly generationMWh
EnergyMenzelet HPPHourly generationMWh
EnergyKılavuzlu HPPHourly generationMWh
EnergySır HPPHourly generationMWh
EnergyBerke HPPHourly generationMWh
WeatherQV2MSpecific humidity at 2 mkg/kg
WeatherU2MEast–west wind components at 2 mm/s
WeatherV2MNorth–south wind components at 2 mm/s
WeatherT2MTemperature at 2 mC
WeatherTQITotal column ice water contentkg/m2
WeatherTQLTotal column liquid water contentkg/m2
WeatherTQVTotal column vapor contentkg/m2
WeatherSWTDNTOA incoming shortwave fluxW/m2
WeatherSWGDNSurface incoming shortwave fluxW/m2
WeatherPRECTOTTotal precipitationmm
WeatherPREVTOTTotal column re-evap of precipitationmm
WeatherPRECSNOSnowfall precipitationmm
MarketMCPMarket clearing priceTRY
MarketWAPWeighted average priceTRY
MarketSMPSystem marginal priceTRY
Table 4. Five best results of the proposed models.
Table 4. Five best results of the proposed models.
TreeLearningR2 Computational
ModelSizeRate(%)RMSSETime (s)
1LightGBM10000.1097.070.12171.240
2LightGBM9000.1097.060.12191.192
3LightGBM8000.1097.050.12201.066
4LightGBM7000.1097.040.12210.894
5LightGBM6000.1097.030.12230.768
6CatBoost10000.1596.940.12424.832
7CatBoost9000.1596.930.12454.316
8CatBoost10000.2096.910.12494.971
9CatBoost8000.2096.900.12503.328
10CatBoost9000.2096.900.12504.049
11XGBoost9000.1596.790.12732.007
12XGBoost6000.1596.780.12741.349
13XGBoost7000.1596.780.12741.600
14XGBoost8000.1596.780.12741.797
15XGBoost10000.1596.780.12742.274
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Atalay, B.A.; Zor, K. An Innovative Approach for Forecasting Hydroelectricity Generation by Benchmarking Tree-Based Machine Learning Models. Appl. Sci. 2025, 15, 10514. https://doi.org/10.3390/app151910514

AMA Style

Atalay BA, Zor K. An Innovative Approach for Forecasting Hydroelectricity Generation by Benchmarking Tree-Based Machine Learning Models. Applied Sciences. 2025; 15(19):10514. https://doi.org/10.3390/app151910514

Chicago/Turabian Style

Atalay, Bektaş Aykut, and Kasım Zor. 2025. "An Innovative Approach for Forecasting Hydroelectricity Generation by Benchmarking Tree-Based Machine Learning Models" Applied Sciences 15, no. 19: 10514. https://doi.org/10.3390/app151910514

APA Style

Atalay, B. A., & Zor, K. (2025). An Innovative Approach for Forecasting Hydroelectricity Generation by Benchmarking Tree-Based Machine Learning Models. Applied Sciences, 15(19), 10514. https://doi.org/10.3390/app151910514

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop