Hydropower Operation Optimization Using Machine Learning: A Systematic Review

: The optimal dispatch of hydropower plants consists of the challenge of taking advantage of both available head and river ﬂows. Despite the objective of delivering the maximum power to the grid, some variables are uncertain, dynamic, non-linear, and non-parametric. Nevertheless, some models may help hydropower generating players with computer science evolution, thus maximizing the hydropower plants’ power production. Over the years, several studies have explored Machine Learning (ML) techniques to optimize hydropower plants’ dispatch, being applied in the pre-operation, real-time and post-operation phases. Hence, this work consists of a systematic review to analyze how ML models are being used to optimize energy production from hydropower plants. The analysis focused on criteria that interfere with energy generation forecasts, operating policies, and performance evaluation. Our discussions aimed at ML techniques, schedule forecasts, river systems, and ML applications for hydropower optimization. The results showed that ML techniques have been more applied for river ﬂow forecast and reservoir operation optimization. The long-term scheduling horizon is the most common application in the analyzed studies. Therefore, supervised learning was more applied as ML technique segment. Despite being a widely explored theme, new areas present opportunities for disruptive research, such as real-time schedule forecast, run-of-river system optimization and low-head hydropower plant operation


Introduction
Hydropower generation has a 75% share of renewable sources in the world electrical mix [1].Therefore, optimizing hydropower generation is of utmost importance from an economic and environmental point of view, which are issues of dominant significance in today's world.An optimized hydropower operation has many benefits, such as the better use of water resources, increased renewable energy production, mitigating the growing energy demand, reducing equipment losses, and extending equipment useful life.However, hydropower optimization is not an easy task.To achieve a good result is fundamental comprehensive monitoring and knowledge of all energy transformation processes in a hydropower plant.
The hydropower generation forecast consists of estimating the power available to the grid.This forecast consists of using hydrological and climatic data, and thus, it is fundamental to have accurate inflow prediction.This process considers the availability of the primary source and the gross head, deducted from hydraulic losses in the water intake, losses of efficiency of the turbine-generator set, internal consumption of auxiliary services, and electrical losses to the grid.
A more accurate prediction allows a higher quality optimization process to determine the better configuration and parameters for efficiently using water resources.The great challenge of energy-efficient power generation is optimally taking advantage of the power generation process variables.When the generating units are operating in the best efficiency in this scenario, the optimized dispatch of generating units becomes an important tool, which necessarily passes through adopting a performance criterion.
A hydropower plant's operation can be broadly analyzed in a watercourse either separately or in a cascade, having the system operation classified in run-of-river [2] or storage reservoir [3].The energy generation process is divided into three stages: preoperation, real-time, and post-operation.The pre-operation steps can be separated into long-term, medium-term, short-term, and real-time scheduling [4].
The literature presents different approaches to optimize the hydropower plant.For example, in terms of real-time operation, [5] presented a system for performance evaluation and energy optimization of the hydropower plant's real-time operation using data collected from sensors and meters and calculated units variables, such as turbine outflows, heads, losses, and efficiencies.
In another example, an optimization algorithm was proposed to determine energy production and maximize a power plant's economic value investment [6].A method proposed by [7] focuses on the most efficient operation of the turbine, aiming to maximize pressure at the end of the penstock, consequently reducing the input flow and increasing the overall hydropower plant efficiency.
Operation and maintenance in hydropower plants can be optimized with cost reduction when using advanced performance monitoring analysis [4].On the other hand, the significant challenge is collecting and analyzing data from all equipment and processes within a plant efficiently to make full use of and take advantage of the information in the data.
The application of Machine Learning (ML) techniques proved to be a suitable methodology to tackle these challenges.For example, operation optimization usually involves complex models and multi-objective functions, especially the hydropower plants with multiple-purpose reservoirs.ML is ideal for solving these complex issues, as different techniques have been developed to find the global optimum faster and more likely.Moreover, the prediction variables' uncertainty, a common problem that makes it harder to plan the operation, can also be reduced with ML application which usually produces more reliable estimations.
Therefore, the literature presents many ML applications in different areas of hydropower operation.The operation of reservoirs is one of the most common applications of ML in literature.For example, the authors applied ML to reduce the shortage index in two reservoirs in Taiwan [8].Run-of-river hydropower plants also present opportunities for these techniques: [9] applied them to improve the weekly forecast accuracy of Sava River low-flow in Slovenia.
On the other hand, there are applications regarding the operation of hydropower plants as well.For example, in [10], three different regression techniques are used in a predictive maintenance monitoring system.Moreover, ref. [11] presents a hydropower plant managing system using neural networks over an extensive dataset from plant monitoring systems.
The last decade presented significant development regarding those data challenges.New technologies and techniques have surfaced, improving data collection and better analysis.In addition, the amount of data collected in hydropower operations increased significantly, presenting even more opportunities for applying ML techniques in the sector.
Hence, this work's primary goal is to perform a systematic review to study and analyze the state-of-the-art ML techniques used to optimize energy production from hydropower plants.The analysis used criteria that interfere with energy generation forecasts, plant (or units) operating policies, and plant (or units) performance evaluation.
The remaining sections of this paper are organized as follows.First, in Section 2, we present the methodology applied to perform the review, our research questions, the inclusion, exclusion, and quality criteria of works, and present the research string and databases used for the research.Next, in Sections 3 and 4, we discuss the review process and our findings on the results.Finally, in Section 5 we elaborate on our findings, and we discuss the opportunities we envision for ML application in hydropower operation and optimization.

Research Methodology
Kitchenham and Charters [12] introduced the sequence for planning and preparing a systematic review.Therefore, a systematic review fairly synthesizes works in some defined research matter.
The methodology underlined the goal to recognize, assess, and discuss pertinent attempts to respond to the research topics.Likewise, they expressed that an audit of the writing should be finished and reasonable; otherwise, it has pretty much nothing scientific.Systematic reviews might have some advantages, such as research with less biased results through a well-defined methodology [13].According to [12], there are three steps for the systematic review development planning, conducting, and results from the analysis.We developed the planning and conducting steps throughout the following two subsections and the results and analysis in the Section 3.Moreover, this systematic review paper was built according to the PRISMA guidelines [14].

Review Planning
The first step of a systematic review is its planning.It is crucial to identify the review's objectives to define what should be analyzed on the articles, and how and where to conduct the research.

Research Questions
Defining the research questions is the first step for planning the review.These questions must be answered towards the conclusion while guiding all analyzing processes of the articles.For this work, we built research questions to find the most significant number of published studies and data that could bring answers related to ML techniques applied for hydropower optimization.Moreover, the research questions we found suitable for this paper are shown in Table 1.

ID
Research Question RQ1 Which ML techniques are mostly used for power generation optimization?

RQ2
What is the planning forecast horizon: long-term schedule (LTS), short-term schedule (STS), or real-time schedule (RTS)?

RQ3
What is the type of river system: run-of-river, storage reservoir, multiple reservoirs or operation in cascade?RQ4 What is the main expected outcome of ML technique application?

Data Sources and Search Strategies
The second step of planning the review is selecting the databases used to search the articles.We used two databases for research: Web of Science (WOS) platform, one of the oldest platforms and provides access to multiple databases from different publishers [15]; and IEEE Xplore database platform, widely used in the engineering area [13].Finally, it is essential to mention that all articles researched were written in English, as it is the scientific area's standard language.

Review Conduction
The review aims to identify the research itself and select primary studies.Therefore, an extraction technique should be applied to the selected databases; a study quality assessment and data synthesis are also necessary.
It is essential to highlight that availability of access defined the databases selected to retrieve the articles.Through a national body, Brazilian universities have partnerships with some publishers, such as IEEE Access and Elsevier.However, some search engines such as Scopus, WOS, and even Google Scholar point to articles from publishers not affiliated with our institution.
We use the databases with which the university has an agreement to search for articles.With the search string presented below in the article, we used the search tools available on the website referring to the database, such as IEEE Xplore, to add the search information.According to the criteria established in the systematic review, the articles resulting from the search were retrieved and then read and analyzed.

Search String
After choosing all databases, a specific search string is based on ML, deep learning, and hydropower operation.Hence, the research string used was: The abstract must contain at least one of the following expressions: ((Machine-learning) OR (Decision Trees) OR (Neural Networks) OR (Gaussian Process regression) OR (Adaptive Neural Based Fuzzy Inference System) OR (extreme learning machine) OR (Naive Bayes) OR (Least Squares) OR (Logistic Regression) OR (Support Vector Machine) OR (Ensemble Methods) OR (Clustering) OR (Machine Learning) OR (Learning) OR (Learn)) and in all the paper must contain ((Hydropower) OR(Hydroelectric)) AND ((Operation) OR (Optimized Operation) OR (Scheduling) OR (Real-time)).

Study Selection
Simply using the search string in the selected databases leads to a significant number of results, and therefore study selection criteria are necessary.These criteria will define if each study should be included in, or excluded from, the systematic review.
For this work, we created inclusion (IC), exclusion (EC) and quality (QC) criteria presented in Tables 2-4, respectively.Due to a vast range of applications, we applied the exclusion criteria (13 in total) to eliminate the works that are not directly correlated with hydropower plant operation optimization, despite being the subject of studies in hydropower plants.Is the publication a complete work?
Additionally, to further refine the articles' selection, we define quality criteria.Regarding the non-numeric criteria: for QC1, we considered if the ML technique was applied soundly with no glaring misconception and if the article was sufficiently sound as well.And regarding QC5, we tried to include only articles that could be considered complete in themselves, not being an isolated part of another work.
Finally, we performed this analysis to search for complete papers published in journals with a relevant impact factor and cited by other authors at least once, except for those published in 2021.Furthermore, only recent articles were accepted (2010-2021).An association of these criteria was employed to determine when a specific work would be included or excluded from the systematic review [16].

Document Selection
We performed the research on the mentioned platforms on 1 October 2021, gathering 386 works distributed in 58 on IEEE Xplore and 328 on WOS databases.Regarding the articles on the WOS platform, from the total 328 works found, only 271 were available on the CAPES portal, 57 works were excluded because we did not have access to the complete works.(Capes-Brazilian institution-Coordination for the Improvement of Higher Education Personnel).
The total number of articles from both databases used to start the systematic review was 329.In addition, there were 34 articles repeated on both platforms, remaining 295 different publications.Finally, after reading the abstract and other necessary sections, inclusion, exclusion, and quality criteria were applied, which excluded more than 222 works, resulting in 73 documents for a complete analysis.Figure 1 illustrates the number of articles separated during the described process.The complete PRISMA flow diagram refer to supplementary material.

Data Synthesis
After analyzing the articles applying all criteria, the remaining articles were classified according to characteristics such as year of publication, operation area, river system, and expected forecast.Finally, the results were tabulated in Table 5.Finally, after finishing the review planning and conduction, we analyzed the resulting articles thoroughly.The results and our discussions will be presented in the next section of this paper.

Result Analysis
This section presents the outcomes of all analyzed works, and answers to the four research questions are given to understand better what is being applied to hydropower optimization.It is essential to point out that the articles considered and discussed in this section meet all inclusion, exclusion, and quality criteria.
First of all, considering the quality criteria for the period of publication (QC4), from 2011 to 2021, Figure 2 shows the distribution of years' frequency.It is noticed that among 73 chosen papers, 68 works (93%) have been published in the last six years.The recent articles' high percentage shows us that ML applied to hydropower operation is an actual topic target of many researchers, proving that this systematic review theme is appropriate.In Table 6, articles were stratified by year of publication and by the database where the article was published.Regarding RQ1-Which ML techniques are mostly used for power generation optimization?-theanswers brought diverse sorts of one application for ML techniques.Nevertheless, some techniques were more applied, such as artificial neural networks (ANN), extreme machine learning, support vector machine, particle swarm optimization, variational mode decomposition, Bayesian techniques, Gaussian regression, and genetic algorithms.
ANN were present in 19 papers, being the most applied ML technique.However, its application has a wide variability, demonstrating a summary of possible strategies for hydropower optimization.
For example, the papers [27,31,69,81] apply ANN to multi-reservoir operation optimization.Ref. [27] uses ML to overcome the problem in deriving complex models as occurs in multipurpose multi-reservoir systems.Therefore, ANN is applied to derive the optimized reservoir release, solving a multi-objective function: minimize water demand deficits and reservoir spills as convex functions while maximizing hydropower energy production as a nonconvex function.On the other hand, ref. [31] investigates the impacts of average annual inflow volume (AAIV) variations on the long-term operation of a multihydropower-reservoir system.ANN is used to derive the adaptive operation rule with nonlinear relationships between decision variables (inflow volume at the current period and water storage volume at the beginning of the current period) and decision-making factors (water storage volume at the end of the current period).
In work [69], the authors state that Bellman stochastic dynamic programming is the most famous approach to multi-reservoir operation optimization.However, in these applications, the computational effort increases exponentially with the number of reservoirs.Therefore, in some cases, this approach becomes intractable to solve.The author proposes an implicit stochastic optimization for this scenario where ANNs derive the Nile River basin's release rules.Thus, an open-loop method approximates the release rules to the optimal policy.
Many papers are focused mainly on providing accurate predictions of river flow/inflow parameters focus on its importance to hydropower plants and reservoir operation, and ANN is the most applied tool to achieve this goal [50,[58][59][60]86,88].
Among these papers cited above, ref. [88] is the only one regarding run-of-rivers power plants to develop river flow prediction.For run-of-river power plants, the impossibility of storing water for an extended period (annual/seasonal/monthly) makes the hourly river flow prediction vital to plan the operation.Therefore, the paper uses ANNs to hourly inflow forecasts of a run-of-river hydropower plant.The authors used a three-layer feed-forward ANNs and Levenberg-Marquardt training algorithm with backpropagation.In addition, they tested different types of ANN input such as temperature, precipitation, and historical water inflow.
Paper [50] developed a hybrid model for monthly streamflow forecasting (LTS) to flood risk mitigation.The hybrid models are designed by incorporating artificial intelligence models (which include Feedforward backpropagation and Radial basis function with decomposition methods).Ref. [59] applies ANN to forecast the reservoir inflow seven days of the lead time to improve the reservoir STS.In work of Jose, the authors performed reservoir inflows predictions applying different static and dynamic ANN models (static feed-forward neural networks, nonlinear autoregressive, and nonlinear autoregressive with exogenous inputs).The models are training using inflows discharges and precipitation data with different time delays.
Furthermore, for assessing the effect of periodicity time index is added to the input data (indicate the number of months from 1 to 12).In work presented in [60], the authors perform the reservoir inflow forecast by ANN to feed the multi-objective numerical optimization of hydropower production, solving by the application of a novel combined Pareto multiobjective differential evolution.In the paper [86], the monthly flow of a river is predicted by two recurrent neural networks techniques: Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU).The monthly flow prediction is important to the long-term generation schedule of the Brazilian electrical power system, for example, to decide whether thermoelectric power plants should begin operation.
Paper [53] adopted an alternative strategy, where the authors use ANNs to predict the downstream water in real-time, and not the typical prediction of river flow widely found in the revision.The objective was to develop an accurate forecast of downstream water levels because this parameter dramatically impacts the economic operation of reregulating hydropower stations.The ANNs were trained using historical measured input parameters such as power generation, upstream level, river flow, and downstream water.The results show downstream water level predictions in real-time with stable results and greater accuracy.
There are applications of ANNs, where the authors' proposal is the cascade reservoirs operation optimization.In this system, the operation of one of the reservoirs impacts the operating parameters of the others.Therefore, this correlation is represented by coupling models used for the correctly joint operation optimization in [70].Backward propagation neural network calculates the downstream reservoir's inflow and the upstream reservoir's tailwater.Its accuracy in exploring water flow hysteresis and the aftereffect of tailwater level variation significantly improves the coupling model's accuracy.
Papers [85,87] are also regarding cascade reservoirs operation optimization.However, the focus is financial.Paper [85] is a multi-objective optimization with a primary objective of profits maximization with additional sub-objectives of startups and shutdowns of generators reduction.The goal of paper [87] is the maximization of time average revenue.Both papers use stochastic optimization algorithms to solve the problems and use ANNs to predict energy pricing and water inflow.
Extreme machine learning were used in many works by [17,29,40,42,46].Because of the high diversity of ML techniques found in the revision, we decided to organize the methods according to Figure 3.In other words, ML techniques are classified into three big groups: Supervised Learning, Unsupervised Learning, and Reinforcement Learning.The supervised learning techniques are the most found in our revision, with 53 articles, or 72.6% of applications overall.Within supervised learning techniques, we found 43 applications of regression techniques and ten applications of classification techniques.The regression techniques are applied for several purposes, such as prediction (inflow, pricing) to derive the optimal operation rule for a reservoir, multiple reservoirs, and cascade reservoirs.The principal regression algorithm applied is ANNs.Classification techniques have applications concentrated in deriving better operation rules for hydropower release or/and decision-making for the operation of reservoirs/hydropower plants [19,21,47], that is, hydropower/reservoirs operation optimization.In addition, decision tree modeling is often found in classification algorithms in these papers, such as in [47,49,65].
The unsupervised learning application is the second most, with 20 or 27.4% overall.Among these ten applications of clustering techniques, nine density estimation techniques, and one application of density reduction.It is essential to point out that many papers applied more than one ML algorithm.The authors combined different unsupervised learning algorithms to maximize its features and reach better results in most cases.On the other hand, refs.[45,57] applied unsupervised learning algorithms to better tune a regression algorithm.
Reinforcement learning techniques were the least applied in the reviewed papers, with eight applications or 10.9% overall.Four applications for each, policy-iteration and value-iteration, were found.Both algorithms share the same working principle, but with different approaches to find the optimal policy: in policy-iteration, policy evaluation and policy improvement are iteratively repeated until policy converges, while in value-iteration, the algorithm iterates until it find an optimal value function.The optimal policy is then derived from the optimal value function.Reinforcement learning techniques are used primarily for optimal hydropower/reservoir operation [18, 28,30,37,52].
For research question RQ2-What is the planning forecast horizon: long-term schedule (LTS), short-term schedule (STS), or real-time schedule (RTS)-It is essential to state that papers regarding deriving operation rules and operation policies were classified as LTS.In addition, works that provide annual and monthly flow forecasts or operation optimization were also classified as LTS.On the other hand, works that provide weekly and daily flow forecasts or operation optimization were tagged as STS.Finally, works related to hourly flow forecasts or operation optimization were classified as RTS.
After analyzing the selected papers, we conclude that the most explored planning horizon is the long-term.As shown in Figure 4, 49 articles, or 67.1%, represent long-term planning forecasts.Short-term and real-time schedules are analyzed in 19 and 13 papers, respectively.This result indicates that improving the operation strategies long before the operation dispatch may mean the most common motivation and where the ML techniques are concentrated to improve the operation.It isn't easy to point out any reason for the paper's concentration on a long-term schedule.Answering question RQ3-What is the type of river system: run-of-river, single reservoir, multiple reservoirs or operation in cascade-ML was applied for reservoir operation optimization, both for single or multiple units and power plant operation in cascade.
The operation optimization is more flexible on reservoirs with long regulation capacity periods, representing the ability to store water resources.Run-of-river hydropower plants present small reservoirs with low regulation capacity; therefore, we assume that this type of power plant does not present reservoirs.
The articles regarding hydropower plants with reservoirs represent 94.5% of analyzed works.We can explain it by observing the restricted operation possibilities in run-of-river hydropower systems.This hydropower plant must respect the rule during operation: water flow income equals the outcome.Therefore, the model-based optimization might deal appropriately with these issues with very few exceptions.On the other hand, the application with hydropower plants with reservoirs, multiples reservoirs, and cascaded reservoirs face challenging issues such as multi-objective optimization, complex or coupling models, explosive number of possibilities, and uncertainty forecast parameters, among others.Therefore, the ML application is the demand to achieve meaningful results.Among the 69 articles that present reservoirs, 21 deal with multi reservoirs (30.4%), 35 single reservoirs (50.7%), and 13 cascade reservoirs (18.8%).Only three articles are about the run-of-river system [36,88,89].These results are detailed in Figure 5.It is important to clarify that the article regarding turbine efficiency curves adjustment [48] does not have a specific river system application.Thus, we did not compute this paper in any river system type.Regarding question RQ4-What is the primary expected outcome of the ML technique application?-theresults showed mostly applications on river flow forecast/inflow.Nearly 31.5% (Figure 6) of the proposals were developed to find water flow estimation.The result is expected since the papers present as motivation generate more accurate water flow data to manage hydropower and reservoirs operation or feed optimization models with proper parameters essential to accurate results.Figure 7 represents the combination of ML groups with the article's primary purpose.Supervised learning is the most applied technique to river flow forecast, with 17 cases.ANNs are applied in seven papers, and extreme learning in three between supervised learning algorithms.For example, the paper [58] uses ANNs to predict reservoir inflow seven days ahead to optimize reservoir operation.Weather forecasts and antecedent hydrological variables were used as ANNs inputs.As a result, additional energy production can be achieved with more accurate inflow predictions without flood risk.It is an important conclusion: the accurate inflow forecast enables optimal and safe operations of reservoirs.Another example, paper [17] adopted a hybrid model for annual runoff forecast.The hybrid model uses the Variational Mode Decomposition algorithm to decompose the yearly time series into subcomponents.Thus, extreme machine learning is applied to formulate the process hidden in each subcomponent, and the aggregated output is the forecast data.The final results show that the proposed model to predict the annual runoff forecast has improved prediction accuracy compared to several traditional methods.Therefore, the hybrid model can be helpful in the mid-long-term operation of water resources and the power system.
Four papers explore unsupervised learning techniques to improve streamflow prediction [45,54,74,89].Nevertheless, the strategy for these papers is to work on the available data to find hidden correlations, patterns or select data subsets properly to improve prediction accuracy.For example, the paper [54] investigates the potential of selection of the best subsets of historical climatological to maximize Ensemble Stream Flow prediction performance.Furthermore, the Genetic Algorithm determines the best set of scenarios.In conclusion, exploring data analysis to use the proper data subsets (size, scenarios, correlation, among others) to feed forecast models/algorithms significantly impacts the prediction quality and accuracy.Moreover, it is indeed an important area that is little explored.
Reservoir operation optimization is another relevant area in the papers reviewed with 18 applications.Within these papers, supervised learning is the leading ML technique with eight cases, followed by reinforcement learning with six applications.The reservoirs' operations are derived typically by linear regression, ANNs, others by applying a fitting strategy.
A different approach investigates the impacts on the operation rules caused by uncertainty on the inflow prediction and the optimization models parameters [19].A supervised learning method (classification), Bayesian Deep learning, was proposed to include the inflow predictions and model parameters' uncertainty to derive real-time reservoir rules.The results showed Bayesian Deep learning method derived four operation rules.It is reliable, robust, and performed better than the Linear Regression method (without uncertainty consideration) regarding hydropower generation.Furthermore, the inflow uncertainty significantly impacts the operation rules output than model parameters uncertainty and its sensibility rise at dry season.
In conclusion, the hydropower operation is subject to model and inflow predictions uncertainty that generally prejudice operation optimization.Nevertheless, the proper application of ML can consider these uncertainties in the optimization methods and significantly improve the operation results.
The reservoirs' operation rules for long-term scheduling are generally established by a well-known fitting strategy such as linear regression, ANNs (widely present in this review), and other nonlinear methods.However, this methodology of optimal parameters for specific functions might not consider uncertainty and nonlinear dependence structure of hydrological variables [37].Thus, this work proposes a combination of copulas with Implicit Scholastic Optimization to perform reservoir operation rules.The methodology presents three stages: Simulation of synthetic streamflow scenarios based on a periodic Vine Copula-Entropy model; estimation of the optimal reservoir dispatch by implicit stochastic optimization; and estimation of the optimal reserve operation policy by a probabilistic approach with copulas.Furthermore, this methodology represents a typical reinforcement learning technique (more specific policy-iteration-based), which is considerably applied to reservoir operation optimization [28,30,52,63].
Combining the results obtained from RQ2, RQ3, and RQ4, it was possible to find what applications and their respective planning horizon for each group of works.Figure 8 analyzes the combination of ML groups and type of River system.As a result, we conclude that the combination of supervised learning with a single reservoir system is most used with 21 cases or 28.7% overall.Figure 9 shows the result of combining the planning forecast horizon and the primary purpose.The most applied combination is LTS with River Flow Forecast, 16 cases or 21.9% overall.Figure 10 combines the ML group and the forecast horizon.The results show 31 applications of supervised learning for LTS, 42% overall.Moreover, supervised learning is also the most applied for STS and RTS with 14 and 7 cases.These combined results are expected and reproduce the previous analysis regarding the article's main purpose, type of river system, planning horizon, and ML technique performed individually.
An important exception is a work with ML application done in [48], where the turbine efficiency curves are adjusted.It is also essential to point out that one article deals with water flow and sediment forecast [45].

Discussions
This systematic review aims to identify the state-of-the-art of how ML has been applied in the technical literature to improve hydropower plants' operation optimization.It is important to emphasize that the paper reached the main objective.Another significant contribution was as follows: most articles use ML to improve the LTS; the applications are majority about hydropower plants with reservoirs (single reservoirs); supervised learning techniques are the most used ML technique.
These findings of our review may present a weakness in the current research on the matter.The prevalent use of supervised learning techniques implies that the models are being built using previous knowledge about the systems, which might reinforce bias or mask unknown correlations between variables in the model.
Moreover, the systematic review indicates approaches little explored to improve the operation optimization, and it may be the subject of future research in this area.As an example, we consider that applying ML techniques that consider the uncertainty on the inflow prediction will help derive more accurate models for hydropower plant operation optimization.
We did not find many articles dealing with optimization dispatch of run-of-river power plants.Therefore, it represents a gap in this systematic review.A possible explanation for this gap would be the restricted possibilities for dispatching the run-of-river power plants, which it is not possible to store water in the reservoirs.Additionally, optimizing the operation of these plants by applying model-based techniques would already be satisfactory.However, there are indeed plants with many generators units that would benefit from ML application.For example, the work [90] deals with the dispatch of the Santo Antonio plant, a hydropower plant in the Amazon basin in Brazil with 50 generator units.The explosive number of generating units combinations possible to dispatch the hydropower plant open the way for ML application.
In general, the methodology applied in the systematic review is solid with include, exclude, and quality criteria for article selections.It is essential to emphasize the importance of the exclude and quality criteria: the exclusion criteria were essential to focus the review on hydropower operation optimization by ML application.For example, many papers deal with hybrid generation systems, being the hydropower generation operation optimized with the specific contributions and restrictions in the process of other/others generation types (photovoltaic, wind, among others types of generation).The quality criteria were addressed to reduce the number of articles deeply analyzed, giving preference to higher quality publications.An important issue to note is the addition of maintenance articles to the exclusion criteria.Papers that applied ML focused on improving activities related to equipment maintenance were excluded.The decision to exclude these articles was motivated to focus only on articles directly linked to optimizing the operation of hydroelectric plants.However, the improvement of maintenance activities for generators, turbines, transformers, pumps, among other equipment, indirectly impacts the operation.For example, it increases the operational reliability and the availability of generating units.Therefore, the maintenance activities can be added as inclusion criteria to a systematic review in future works related to the operation optimization of hydropower plants.

Conclusions
The paper aimed to review the academic literature on ML techniques applied for hydropower optimization.After research on technical databases, we classified 73 works for this study.Therefore, analyses and discussions were made considering three main points: forecast schedule, ML technique groups, and river system.
Regarding ML technique groups considered for the analysis, supervised learning is broadly applied using regression and classification techniques.Furthermore, It was noted the extensive use of Artificial Neural networks due to its capacity to fits appropriately for most of the applications of ML on hydropower operation optimization: derivation of parameters for a forecast of river flow; optimization model for reservoir operation; multi-objective optimization model operation of multi-propose reservoirs; and derivation of operation rules; among others.
We found clustering and density estimation techniques for unsupervised learning, representing the second most used application.Additionally, this group's main application is for river flow forecast.Most of them are for single reservoirs.
Despite the reinforcement learning group being the least frequent application, we found both policy and value-interaction mainly applied to single and multiples reservoirs.
Regarding the type of river system, most of the applications were in hydropower plants with reservoirs.Therefore, it is an important contribution of this study.Furthermore, due to the complex issues present in the operation of single, multiple, and cascaded reservoirs, ML is an alternative in the search for improvements.Therefore, ML has been widely applied to deal with these complex problems successfully and accurately.
Regarding the planning forecast horizon, the study identifies most of the applications about LTS.It represents another contribution of this work.The weak part about the planning forecast horizon analysis is the few real-time applications (only 13).
A significant gap observed in this study is the reduced number of run-of-river hydropower plants found in the review.Hence, there is an open space for future works focusing on this type of hydropower plant.Another opportunity is regarding the articles mainly focused on maintenance activities, which can also be the theme of future studies.
Finally, the evolution of connectivity, instrumentation, and computer science towards emerging concepts like Internet-of-Things and Industry 4.0 will lead hydropower plants to rely even more on ML and big data tools and applications.These techniques are very well suited to deal with the complexity of the challenges presented in the sector.
Our work envisions future opportunities for ML applications in several areas of hydropower operation.Areas such as inflow forecasts, scheduling, and operation policies already use ML applications, but will still present challenges suitable for them.Additionally, we believe that ML applications can significantly benefit areas like optimal dispatch, maintenance, and general operations.

Figure 1 .
Figure 1.Articles filtering during the conducting stage.

Figure 5 .
Figure 5. River system in each work.

Figure 6 .
Figure 6.Main purpose of each work.

Figure 7 .
Figure 7. Machine learning vs main purpose.

Table 6 .
Articles per year and database.