Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Deep Reinforcement Learning for Intraday Multireservoir Hydropower Management

Mathematics 2025, 13(1), 151; https://doi.org/10.3390/math13010151

by Rodrigo Castro-Freibott^1,*

, Álvaro García-Sánchez^2,*

, Francisco Espiga-Fernández²

and Guillermo González-Santander de la Cruz¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Mathematics 2025, 13(1), 151; https://doi.org/10.3390/math13010151

Submission received: 31 October 2024 / Revised: 19 December 2024 / Accepted: 27 December 2024 / Published: 3 January 2025

(This article belongs to the Section E1: Mathematics and Computer Science)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Comments:

This paper applies continuous state deep reinforcement learning (DRL) to the intraday economic optimization problem of hydropower reservoirs. The detailed comments are listed as follows.

1. The novelty of the proposed method is not significant, as it primarily applies existing deep reinforcement learning (DRL) algorithms (SAC, PPO, A2C) to reservoir optimization without introducing substantial methodological advancements. The study does not offer unique modifications or innovations to address domain-specific challenges, such as uncertainty handling.

2. The experimental results are insufficient, as the performance of the three algorithms in the three action spaces does not have detailed experimental comparison. In addition, the experiments are conducted only on a specific and short dataset, it fails to cover diverse real-world application scenarios, which limits the generalizability of the results.

3. The detailed mathematical modeling of the problem is missing. While the paper discusses the intraday economic optimization problem of hydropower reservoirs, it does not provide a clear mathematical framework to define the optimization objectives and constraints.

4. The literature review in the paper is brief, and the summary of existing research is not sufficiently thorough. The limitations of current studies are not discussed in detail, and the unique contribution of this research, as well as its distinction from previous work, are not clearly highlighted.

5. The paper does not have sufficient diagrams and flowcharts to clearly illustrate the proposed algorithms and structure.

Minor comments：

The reason for choosing these three algorithms (A2C, PPO and SAC) is not provided.

The conclusion should be rewritten with clear highlights of this paper.

The abbreviations in this paper are not completely introduced in the "Abbreviations " section on page 9, such as HRIEO

Author Response

Dear Reviewer,

Thank you for your thorough and thoughtful review of our manuscript. We have made significant revisions to address your comments and concerns. A detailed summary of the changes can be found in the attached document.

We appreciate your time and effort in helping us improve our work.

Kind regards,
The Authors

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript develop an RL framework for the Hydropower Reservoirs Intraday Economic Optimization problem, which handles continuous state-action spaces and incorporates complex operational constraints including dam-to-turbine delays, gate movement restrictions, and power group dynamics. It may be a well study, however it need a major revision. Some specific comments that might help the authors further enhance the manuscript's quality were listed.

The first paragraph should introduce the important significance of hydropower station scheduling. Please add it.

Many literatures have applied reinforcement learning to the scheduling of hydropower stations. What is the innovation of the paper?

There are few references in the introduction, which need to be supplemented.

P81-83, the opening of the gate can be kept constant, but the upstream water head will change, so the outlet flow of the gate will also change. This is inconsistent with the statement in the paper that the outlet flow is constant. Please explain.

No specific engineering information was provided, please supplement.

The goal of the model is to maximize total income, but it does not introduce the electricity standards in the region or whether there are time of use electricity prices.

Why is the duration of training 2 dams and 6 dams inconsistent?

There is no mention of reservoir simulation methods in the Materials and Methods section, which needs to be supplemented.

The simulation model is the foundation, and it is necessary to compare the simulation results with the measured data, and then compare the optimization plan with the actual operation plan. This paper only conducted experimental comparison and did not compare with the actual operation of the reservoir. Relevant content needs to be supplemented.

Author Response

Dear Reviewer,

We appreciate your time and effort in helping us improve our work.

Kind regards,
The Authors

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This paper is aimed at studying the application of Reinforcement Learning (RL) to optimize intraday operations of hydropower reservoirs. The goal is to overcome some limitations in previous approaches that relied on discrete space states. It is certainly a nicely written paper on a problem of current interest. The authors discussed the so-called “Hydro- power Reservoirs Intraday Economic Optimization (HRIEO) problem and providing two solution approaches”, a problem that has been proposed previously by some of the authors. Indeed, the discussion, analysis, methodology, and several results of the paper seem very close to those of the paper “Castro-Freibott, R., García-Castellano Gerbolés, C., García-Sánchez, A., & Ortega-Mier, M. (2024). MILP and PSO approaches for solving a hydropower reservoirs intraday economic optimization problem. Central European Journal of Operations Research (CEJOR), 1-24”.

HREIO was posed as an extension of the Hydropower Reservoirs Operation Optimization (HROO) problem, and as such, adds some features to overcome the limitations of the latter (introduces short-term phenomena complexities), but shares a lot in common with it. Specially, the different alternatives to solve it: Mathematical programming (MILP, for instance), Dynamic programming, Metaheuristic algorithms (PSO, for instance), and Reinforcement Learning (RL). In the CEJOR paper, the HREIO problem was solved using MILP and PSO against a Greedy algorithm as a benchmark (consisting of the gates completely open at every period). In the paper under review, the HREIO problem was solved using MILP and RL against a Greedy algorithm as a benchmark (consisting of the gates completely open at every period). The dataset for both papers is the same: https://github.com/baobabsoluciones/flowing-basin

PSO was selected in the CEJOR paper due to the fact that it is a robust heuristic method which has been used successfully in similar problems, requiring low development efforts and promising good outcomes, although in large datasets often shows poor quality results. The authors conclude that “when the hydropower station has two dams, the MILP’s solutions are near-optimal and they outperform the PSO solutions. However, when the station has six dams, the PSO scales better and significantly outperforms the MILP. The relative performance of the methods does not depend on the constraint or operational costs considered, but the PSO does not always satisfy the relevant constraints.”

RL was selected in the paper under review even though the problem of interests does not seems to exactly markovian. Finally, the authors conclude that “in the two-reservoir system, while the Mixed-Integer Linear Programming (MILP) approach achieved superior performance, our RL agents demonstrated the ability to balance solution quality with computational efficiency. When scaling to larger systems, the MILP’s performance degraded below the Greedy baseline, while RL agents maintained slight improvements. This indicates that while both approaches face challenges as the problem size increases, RL is more scalable than exact optimization methods”.

As one can see, the current paper seems to be a ·second edition· of the CEJOR paper which includes a better benchmark method, and, therefore, it is important to amend the discussion of the key differences between both benchmarkings in order to publish the new paper.

Author Response

Dear Reviewer,

Thank you for your thoughtful review of our manuscript. We have made revisions to address your comments; a detailed summary of the changes can be found in the attached document.

We appreciate your time and effort in helping us improve our work.

Kind regards,
The Authors

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have addressed all my concerns and I will suggest to accept the current version.

Article Menu

Deep Reinforcement Learning for Intraday Multireservoir Hydropower Management

Further Information

Guidelines

MDPI Initiatives

Follow MDPI