Stochastic dynamic programming (SDP) is a widely-used method for reservoir operations optimization under uncertainty but suffers from the dual curses of dimensionality and modeling. Reinforcement learning (RL), a simulation-based stochastic optimization approach, can nullify the curse of modeling that arises from the need for calculating a very large transition probability matrix. RL mitigates the curse of the dimensionality problem, but cannot solve it completely as it remains computationally intensive in complex multi-reservoir systems. This paper presents a multi-agent RL approach combined with an aggregation/decomposition (AD-RL) method for reducing the curse of dimensionality in multi-reservoir operation optimization problems. In this model, each reservoir is individually managed by a specific operator (agent) while co-operating with other agents systematically on finding a near-optimal operating policy for the whole system. Each agent makes a decision (release) based on its current state and the feedback it receives from the states of all upstream and downstream reservoirs. The method, along with an efficient artificial neural network-based robust procedure for the task of tuning Q-learning parameters, has been applied to a real-world five-reservoir problem, i.e., the Parambikulam–Aliyar Project (PAP) in India. We demonstrate that the proposed AD-RL approach helps to derive operating policies that are better than or comparable with the policies obtained by other stochastic optimization methods with less computational burden.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited