1. Introduction
Hydropower generation has a 75% share of renewable sources in the world electrical mix [
1]. Therefore, optimizing hydropower generation is of utmost importance from an economic and environmental point of view, which are issues of dominant significance in today’s world. An optimized hydropower operation has many benefits, such as the better use of water resources, increased renewable energy production, mitigating the growing energy demand, reducing equipment losses, and extending equipment useful life. However, hydropower optimization is not an easy task. To achieve a good result is fundamental comprehensive monitoring and knowledge of all energy transformation processes in a hydropower plant.
The hydropower generation forecast consists of estimating the power available to the grid. This forecast consists of using hydrological and climatic data, and thus, it is fundamental to have accurate inflow prediction. This process considers the availability of the primary source and the gross head, deducted from hydraulic losses in the water intake, losses of efficiency of the turbine-generator set, internal consumption of auxiliary services, and electrical losses to the grid.
A more accurate prediction allows a higher quality optimization process to determine the better configuration and parameters for efficiently using water resources. The great challenge of energy-efficient power generation is optimally taking advantage of the power generation process variables. When the generating units are operating in the best efficiency in this scenario, the optimized dispatch of generating units becomes an important tool, which necessarily passes through adopting a performance criterion.
A hydropower plant’s operation can be broadly analyzed in a watercourse either separately or in a cascade, having the system operation classified in run-of-river [
2] or storage reservoir [
3]. The energy generation process is divided into three stages: pre-operation, real-time, and post-operation. The pre-operation steps can be separated into long-term, medium-term, short-term, and real-time scheduling [
4].
The literature presents different approaches to optimize the hydropower plant. For example, in terms of real-time operation, [
5] presented a system for performance evaluation and energy optimization of the hydropower plant’s real-time operation using data collected from sensors and meters and calculated units variables, such as turbine outflows, heads, losses, and efficiencies.
In another example, an optimization algorithm was proposed to determine energy production and maximize a power plant’s economic value investment [
6]. A method proposed by [
7] focuses on the most efficient operation of the turbine, aiming to maximize pressure at the end of the penstock, consequently reducing the input flow and increasing the overall hydropower plant efficiency.
Operation and maintenance in hydropower plants can be optimized with cost reduction when using advanced performance monitoring analysis [
4]. On the other hand, the significant challenge is collecting and analyzing data from all equipment and processes within a plant efficiently to make full use of and take advantage of the information in the data.
The application of Machine Learning (ML) techniques proved to be a suitable methodology to tackle these challenges. For example, operation optimization usually involves complex models and multi-objective functions, especially the hydropower plants with multiple-purpose reservoirs. ML is ideal for solving these complex issues, as different techniques have been developed to find the global optimum faster and more likely. Moreover, the prediction variables’ uncertainty, a common problem that makes it harder to plan the operation, can also be reduced with ML application which usually produces more reliable estimations.
Therefore, the literature presents many ML applications in different areas of hydropower operation. The operation of reservoirs is one of the most common applications of ML in literature. For example, the authors applied ML to reduce the shortage index in two reservoirs in Taiwan [
8]. Run-of-river hydropower plants also present opportunities for these techniques: [
9] applied them to improve the weekly forecast accuracy of Sava River low-flow in Slovenia.
On the other hand, there are applications regarding the operation of hydropower plants as well. For example, in [
10], three different regression techniques are used in a predictive maintenance monitoring system. Moreover, ref. [
11] presents a hydropower plant managing system using neural networks over an extensive dataset from plant monitoring systems.
The last decade presented significant development regarding those data challenges. New technologies and techniques have surfaced, improving data collection and better analysis. In addition, the amount of data collected in hydropower operations increased significantly, presenting even more opportunities for applying ML techniques in the sector.
Hence, this work’s primary goal is to perform a systematic review to study and analyze the state-of-the-art ML techniques used to optimize energy production from hydropower plants. The analysis used criteria that interfere with energy generation forecasts, plant (or units) operating policies, and plant (or units) performance evaluation.
The remaining sections of this paper are organized as follows. First, in
Section 2, we present the methodology applied to perform the review, our research questions, the inclusion, exclusion, and quality criteria of works, and present the research string and databases used for the research. Next, in
Section 3 and
Section 4, we discuss the review process and our findings on the results. Finally, in
Section 5 we elaborate on our findings, and we discuss the opportunities we envision for ML application in hydropower operation and optimization.
3. Result Analysis
This section presents the outcomes of all analyzed works, and answers to the four research questions are given to understand better what is being applied to hydropower optimization. It is essential to point out that the articles considered and discussed in this section meet all inclusion, exclusion, and quality criteria.
First of all, considering the quality criteria for the period of publication (QC4), from 2011 to 2021,
Figure 2 shows the distribution of years’ frequency. It is noticed that among 73 chosen papers, 68 works (93%) have been published in the last six years. The recent articles’ high percentage shows us that ML applied to hydropower operation is an actual topic target of many researchers, proving that this systematic review theme is appropriate.
In
Table 6, articles were stratified by year of publication and by the database where the article was published.
Regarding RQ1—Which ML techniques are mostly used for power generation optimization?—the answers brought diverse sorts of one application for ML techniques. Nevertheless, some techniques were more applied, such as artificial neural networks (ANN), extreme machine learning, support vector machine, particle swarm optimization, variational mode decomposition, Bayesian techniques, Gaussian regression, and genetic algorithms.
ANN were present in 19 papers, being the most applied ML technique. However, its application has a wide variability, demonstrating a summary of possible strategies for hydropower optimization.
For example, the papers [
27,
31,
69,
81] apply ANN to multi-reservoir operation optimization. Ref. [
27] uses ML to overcome the problem in deriving complex models as occurs in multipurpose multi-reservoir systems. Therefore, ANN is applied to derive the optimized reservoir release, solving a multi-objective function: minimize water demand deficits and reservoir spills as convex functions while maximizing hydropower energy production as a nonconvex function. On the other hand, ref. [
31] investigates the impacts of average annual inflow volume (AAIV) variations on the long-term operation of a multi-hydropower-reservoir system. ANN is used to derive the adaptive operation rule with nonlinear relationships between decision variables (inflow volume at the current period and water storage volume at the beginning of the current period) and decision-making factors (water storage volume at the end of the current period).
In work [
69], the authors state that Bellman stochastic dynamic programming is the most famous approach to multi-reservoir operation optimization. However, in these applications, the computational effort increases exponentially with the number of reservoirs. Therefore, in some cases, this approach becomes intractable to solve. The author proposes an implicit stochastic optimization for this scenario where ANNs derive the Nile River basin’s release rules. Thus, an open-loop method approximates the release rules to the optimal policy.
Many papers are focused mainly on providing accurate predictions of river flow/inflow parameters focus on its importance to hydropower plants and reservoir operation, and ANN is the most applied tool to achieve this goal [
50,
58,
59,
60,
86,
88].
Among these papers cited above, ref. [
88] is the only one regarding run-of-rivers power plants to develop river flow prediction. For run-of-river power plants, the impossibility of storing water for an extended period (annual/seasonal/monthly) makes the hourly river flow prediction vital to plan the operation. Therefore, the paper uses ANNs to hourly inflow forecasts of a run-of-river hydropower plant. The authors used a three-layer feed-forward ANNs and Levenberg–Marquardt training algorithm with backpropagation. In addition, they tested different types of ANN input such as temperature, precipitation, and historical water inflow.
Paper [
50] developed a hybrid model for monthly streamflow forecasting (LTS) to flood risk mitigation. The hybrid models are designed by incorporating artificial intelligence models (which include Feedforward backpropagation and Radial basis function with decomposition methods). Ref. [
59] applies ANN to forecast the reservoir inflow seven days of the lead time to improve the reservoir STS. In work of Jose, the authors performed reservoir inflows predictions applying different static and dynamic ANN models (static feed-forward neural networks, nonlinear autoregressive, and nonlinear autoregressive with exogenous inputs). The models are training using inflows discharges and precipitation data with different time delays.
Furthermore, for assessing the effect of periodicity time index is added to the input data (indicate the number of months from 1 to 12). In work presented in [
60], the authors perform the reservoir inflow forecast by ANN to feed the multi-objective numerical optimization of hydropower production, solving by the application of a novel combined Pareto multi-objective differential evolution. In the paper [
86], the monthly flow of a river is predicted by two recurrent neural networks techniques: Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU). The monthly flow prediction is important to the long-term generation schedule of the Brazilian electrical power system, for example, to decide whether thermoelectric power plants should begin operation.
Paper [
53] adopted an alternative strategy, where the authors use ANNs to predict the downstream water in real-time, and not the typical prediction of river flow widely found in the revision. The objective was to develop an accurate forecast of downstream water levels because this parameter dramatically impacts the economic operation of re-regulating hydropower stations. The ANNs were trained using historical measured input parameters such as power generation, upstream level, river flow, and downstream water. The results show downstream water level predictions in real-time with stable results and greater accuracy.
There are applications of ANNs, where the authors’ proposal is the cascade reservoirs operation optimization. In this system, the operation of one of the reservoirs impacts the operating parameters of the others. Therefore, this correlation is represented by coupling models used for the correctly joint operation optimization in [
70]. Backward propagation neural network calculates the downstream reservoir’s inflow and the upstream reservoir’s tailwater. Its accuracy in exploring water flow hysteresis and the aftereffect of tailwater level variation significantly improves the coupling model’s accuracy.
Papers [
85,
87] are also regarding cascade reservoirs operation optimization. However, the focus is financial. Paper [
85] is a multi-objective optimization with a primary objective of profits maximization with additional sub-objectives of startups and shutdowns of generators reduction. The goal of paper [
87] is the maximization of time average revenue. Both papers use stochastic optimization algorithms to solve the problems and use ANNs to predict energy pricing and water inflow.
Extreme machine learning were used in many works by [
17,
29,
40,
42,
46]. Because of the high diversity of ML techniques found in the revision, we decided to organize the methods according to
Figure 3. In other words, ML techniques are classified into three big groups: Supervised Learning, Unsupervised Learning, and Reinforcement Learning.
The supervised learning techniques are the most found in our revision, with 53 articles, or 72.6% of applications overall. Within supervised learning techniques, we found 43 applications of regression techniques and ten applications of classification techniques. The regression techniques are applied for several purposes, such as prediction (inflow, pricing) to derive the optimal operation rule for a reservoir, multiple reservoirs, and cascade reservoirs. The principal regression algorithm applied is ANNs. Classification techniques have applications concentrated in deriving better operation rules for hydropower release or/and decision-making for the operation of reservoirs/hydropower plants [
19,
21,
47], that is, hydropower/reservoirs operation optimization. In addition, decision tree modeling is often found in classification algorithms in these papers, such as in [
47,
49,
65].
The unsupervised learning application is the second most, with 20 or 27.4% overall. Among these ten applications of clustering techniques, nine density estimation techniques, and one application of density reduction. It is essential to point out that many papers applied more than one ML algorithm. The authors combined different unsupervised learning algorithms to maximize its features and reach better results in most cases. On the other hand, refs. [
45,
57] applied unsupervised learning algorithms to better tune a regression algorithm.
Reinforcement learning techniques were the least applied in the reviewed papers, with eight applications or 10.9% overall. Four applications for each, policy-iteration and value-iteration, were found. Both algorithms share the same working principle, but with different approaches to find the optimal policy: in policy-iteration, policy evaluation and policy improvement are iteratively repeated until policy converges, while in value-iteration, the algorithm iterates until it find an optimal value function. The optimal policy is then derived from the optimal value function. Reinforcement learning techniques are used primarily for optimal hydropower/reservoir operation [
18,
28,
30,
37,
52].
For research question RQ2—What is the planning forecast horizon: long-term schedule (LTS), short-term schedule (STS), or real-time schedule (RTS)—It is essential to state that papers regarding deriving operation rules and operation policies were classified as LTS. In addition, works that provide annual and monthly flow forecasts or operation optimization were also classified as LTS. On the other hand, works that provide weekly and daily flow forecasts or operation optimization were tagged as STS. Finally, works related to hourly flow forecasts or operation optimization were classified as RTS.
After analyzing the selected papers, we conclude that the most explored planning horizon is the long-term. As shown in
Figure 4, 49 articles, or 67.1%, represent long-term planning forecasts. Short-term and real-time schedules are analyzed in 19 and 13 papers, respectively. This result indicates that improving the operation strategies long before the operation dispatch may mean the most common motivation and where the ML techniques are concentrated to improve the operation. It isn’t easy to point out any reason for the paper’s concentration on a long-term schedule.
Answering question RQ3—What is the type of river system: run-of-river, single reservoir, multiple reservoirs or operation in cascade—ML was applied for reservoir operation optimization, both for single or multiple units and power plant operation in cascade.
The operation optimization is more flexible on reservoirs with long regulation capacity periods, representing the ability to store water resources. Run-of-river hydropower plants present small reservoirs with low regulation capacity; therefore, we assume that this type of power plant does not present reservoirs.
The articles regarding hydropower plants with reservoirs represent 94.5% of analyzed works. We can explain it by observing the restricted operation possibilities in run-of-river hydropower systems. This hydropower plant must respect the rule during operation: water flow income equals the outcome. Therefore, the model-based optimization might deal appropriately with these issues with very few exceptions. On the other hand, the application with hydropower plants with reservoirs, multiples reservoirs, and cascaded reservoirs face challenging issues such as multi-objective optimization, complex or coupling models, explosive number of possibilities, and uncertainty forecast parameters, among others. Therefore, the ML application is the demand to achieve meaningful results. Among the 69 articles that present reservoirs, 21 deal with multi reservoirs (30.4%), 35 single reservoirs (50.7%), and 13 cascade reservoirs (18.8%). Only three articles are about the run-of-river system [
36,
88,
89]. These results are detailed in
Figure 5. It is important to clarify that the article regarding turbine efficiency curves adjustment [
48] does not have a specific river system application. Thus, we did not compute this paper in any river system type.
Regarding question RQ4—What is the primary expected outcome of the ML technique application?—the results showed mostly applications on river flow forecast/inflow. Nearly 31.5% (
Figure 6) of the proposals were developed to find water flow estimation. The result is expected since the papers present as motivation generate more accurate water flow data to manage hydropower and reservoirs operation or feed optimization models with proper parameters essential to accurate results.
Figure 7 represents the combination of ML groups with the article’s primary purpose. Supervised learning is the most applied technique to river flow forecast, with 17 cases. ANNs are applied in seven papers, and extreme learning in three between supervised learning algorithms. For example, the paper [
58] uses ANNs to predict reservoir inflow seven days ahead to optimize reservoir operation. Weather forecasts and antecedent hydrological variables were used as ANNs inputs. As a result, additional energy production can be achieved with more accurate inflow predictions without flood risk. It is an important conclusion: the accurate inflow forecast enables optimal and safe operations of reservoirs.
Another example, paper [
17] adopted a hybrid model for annual runoff forecast. The hybrid model uses the Variational Mode Decomposition algorithm to decompose the yearly time series into subcomponents. Thus, extreme machine learning is applied to formulate the process hidden in each subcomponent, and the aggregated output is the forecast data. The final results show that the proposed model to predict the annual runoff forecast has improved prediction accuracy compared to several traditional methods. Therefore, the hybrid model can be helpful in the mid-long-term operation of water resources and the power system.
Four papers explore unsupervised learning techniques to improve streamflow prediction [
45,
54,
74,
89]. Nevertheless, the strategy for these papers is to work on the available data to find hidden correlations, patterns or select data subsets properly to improve prediction accuracy. For example, the paper [
54] investigates the potential of selection of the best subsets of historical climatological to maximize Ensemble Stream Flow prediction performance. Furthermore, the Genetic Algorithm determines the best set of scenarios. In conclusion, exploring data analysis to use the proper data subsets (size, scenarios, correlation, among others) to feed forecast models/algorithms significantly impacts the prediction quality and accuracy. Moreover, it is indeed an important area that is little explored.
Reservoir operation optimization is another relevant area in the papers reviewed with 18 applications. Within these papers, supervised learning is the leading ML technique with eight cases, followed by reinforcement learning with six applications. The reservoirs’ operations are derived typically by linear regression, ANNs, others by applying a fitting strategy.
A different approach investigates the impacts on the operation rules caused by uncertainty on the inflow prediction and the optimization models parameters [
19]. A supervised learning method (classification), Bayesian Deep learning, was proposed to include the inflow predictions and model parameters’ uncertainty to derive real-time reservoir rules. The results showed Bayesian Deep learning method derived four operation rules. It is reliable, robust, and performed better than the Linear Regression method (without uncertainty consideration) regarding hydropower generation. Furthermore, the inflow uncertainty significantly impacts the operation rules output than model parameters uncertainty and its sensibility rise at dry season.
In conclusion, the hydropower operation is subject to model and inflow predictions uncertainty that generally prejudice operation optimization. Nevertheless, the proper application of ML can consider these uncertainties in the optimization methods and significantly improve the operation results.
The reservoirs’ operation rules for long-term scheduling are generally established by a well-known fitting strategy such as linear regression, ANNs (widely present in this review), and other nonlinear methods. However, this methodology of optimal parameters for specific functions might not consider uncertainty and nonlinear dependence structure of hydrological variables [
37]. Thus, this work proposes a combination of copulas with Implicit Scholastic Optimization to perform reservoir operation rules. The methodology presents three stages: Simulation of synthetic streamflow scenarios based on a periodic Vine Copula-Entropy model; estimation of the optimal reservoir dispatch by implicit stochastic optimization; and estimation of the optimal reserve operation policy by a probabilistic approach with copulas. Furthermore, this methodology represents a typical reinforcement learning technique (more specific policy-iteration-based), which is considerably applied to reservoir operation optimization [
28,
30,
52,
63].
Combining the results obtained from RQ2, RQ3, and RQ4, it was possible to find what applications and their respective planning horizon for each group of works.
Figure 8 analyzes the combination of ML groups and type of River system. As a result, we conclude that the combination of supervised learning with a single reservoir system is most used with 21 cases or 28.7% overall.
Figure 9 shows the result of combining the planning forecast horizon and the primary purpose. The most applied combination is LTS with River Flow Forecast, 16 cases or 21.9% overall.
Figure 10 combines the ML group and the forecast horizon. The results show 31 applications of supervised learning for LTS, 42% overall. Moreover, supervised learning is also the most applied for STS and RTS with 14 and 7 cases.
These combined results are expected and reproduce the previous analysis regarding the article’s main purpose, type of river system, planning horizon, and ML technique performed individually.
An important exception is a work with ML application done in [
48], where the turbine efficiency curves are adjusted. It is also essential to point out that one article deals with water flow and sediment forecast [
45].
5. Conclusions
The paper aimed to review the academic literature on ML techniques applied for hydropower optimization. After research on technical databases, we classified 73 works for this study. Therefore, analyses and discussions were made considering three main points: forecast schedule, ML technique groups, and river system.
Regarding ML technique groups considered for the analysis, supervised learning is broadly applied using regression and classification techniques. Furthermore, It was noted the extensive use of Artificial Neural networks due to its capacity to fits appropriately for most of the applications of ML on hydropower operation optimization: derivation of parameters for a forecast of river flow; optimization model for reservoir operation; multi-objective optimization model operation of multi-propose reservoirs; and derivation of operation rules; among others.
We found clustering and density estimation techniques for unsupervised learning, representing the second most used application. Additionally, this group’s main application is for river flow forecast. Most of them are for single reservoirs.
Despite the reinforcement learning group being the least frequent application, we found both policy and value-interaction mainly applied to single and multiples reservoirs.
Regarding the type of river system, most of the applications were in hydropower plants with reservoirs. Therefore, it is an important contribution of this study. Furthermore, due to the complex issues present in the operation of single, multiple, and cascaded reservoirs, ML is an alternative in the search for improvements. Therefore, ML has been widely applied to deal with these complex problems successfully and accurately.
Regarding the planning forecast horizon, the study identifies most of the applications about LTS. It represents another contribution of this work. The weak part about the planning forecast horizon analysis is the few real-time applications (only 13).
A significant gap observed in this study is the reduced number of run-of-river hydropower plants found in the review. Hence, there is an open space for future works focusing on this type of hydropower plant. Another opportunity is regarding the articles mainly focused on maintenance activities, which can also be the theme of future studies.
Finally, the evolution of connectivity, instrumentation, and computer science towards emerging concepts like Internet-of-Things and Industry 4.0 will lead hydropower plants to rely even more on ML and big data tools and applications. These techniques are very well suited to deal with the complexity of the challenges presented in the sector.
Our work envisions future opportunities for ML applications in several areas of hydropower operation. Areas such as inflow forecasts, scheduling, and operation policies already use ML applications, but will still present challenges suitable for them. Additionally, we believe that ML applications can significantly benefit areas like optimal dispatch, maintenance, and general operations.