# Aggregation–Decomposition-Based Multi-Agent Reinforcement Learning for Multi-Reservoir Operations Optimization

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Aggregation–Decomposition Methods and Reinforcement Learning

#### 2.1. Reservoir Operation Optimization Model

#### 2.1.1. Objective Function

#### 2.1.2. Constraints

#### 2.2. Stochastic Dynamic Programming (SDP)

#### 2.3. Aggregation–Decomposition Dynamic Programming (AD-DP)

#### 2.4. Multilevel Approximation-Dynamic Programming (MAM-DP)

#### 2.5. Reinforcement Learning (RL)

#### 2.5.1. Action-Taking Policy

#### 2.5.2. Admissible Actions

#### 2.5.3. Q-Learning

#### 2.6. Aggregation/Decomposition Reinforcement Learning (–-RL)

- The beginning storage of focus (actual) reservoir (${s}_{i}^{t}$)
- The summation of beginning storage of all non-upstream reservoirs ($Sn{u}_{i}^{t}$)
- The summation of beginning storage of all upstream reservoirs ($S{u}_{i}^{t}$)

- Calculate the beginning storages of upstream ($S{u}_{i}^{t-1}$) and non-upstream ($Sn{u}_{i}^{t-1}$) reservoirs.
- Take an action (release) using one of the action-taking policies such as Softmax, ε-greedy, greedy, or random. For instance, if using ε-greedy policy, the probability of release (${R}_{i}^{t}$) from reservoir $i$ in period $t$ in state ${s}_{i}^{t}$ is calculated as follows:

## 3. Problem Settings and Results

#### 3.1. Case Study: Parambikulam–Aliyar Project (PAP)

#### 3.2. MAM-DP Method Applied to Parambikulam–Aliyar Project (PAP)

#### 3.3. AD-DP Method Applied to PAP Project

#### 3.4. Fletcher–Ponnambalam (FP) Method Applied to PAP

#### 3.5. AD-RL Method

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Hiew, K.L.; Labadie, J.W.; Scott, J.F. Optimal operational analysis of the Colorado-Big Thompson project. In Proceedings of the Computerized Decision Support Systems for Water Managers; Labadie, J., Ed.; ASCE: Reston, VA, USA, 1989; pp. 632–646. [Google Scholar]
- Willis, R.; Finney, B.A.; Chu, W.S. Monte Carlo Optimization for Reservoir Operation. Water Resour. Res.
**1984**, 20, 1177–1182. [Google Scholar] [CrossRef] - Hiew, K.L. Optimization Algorithms for Large-Scale Multireservoir Hydropower Systems; Colorado State University: Fort Collins, CO, USA, 1987. [Google Scholar]
- Tejada-Guibert, J.A.; Stedinger, J.R.; Staschus, K. Optimization of Value of CVP’s Hydropower Production. J. Water Resour. Plan. Manag.
**1990**, 116, 52–70. [Google Scholar] [CrossRef] - Arnold, E.; Tatjewski, P.; Wołochowicz, P. Two Methods for Large-Scale Nonlinear Optimization and Their Comparison on a Case Study of Hydropower Optimization. J. Optim. Theory Appl.
**1994**, 81, 221–248. [Google Scholar] [CrossRef] - Lee, J.H.; Labadie, J.W. Stochastic Optimization of Multireservoir Systems via Reinforcement Learning. Water Resour. Res.
**2007**. [Google Scholar] [CrossRef] - Aiken, L.S.; West, S.G.; Pitts, S.C. Multiple Linear Regression; Handbook of Psychology; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
- Hornik, K.; Stinchcombe, M.; White, H. Multilayer Feedforward Networks are Universal Approximators. Neural Netw.
**1989**, 2, 359–366. [Google Scholar] [CrossRef] - Wang, L.-X.; Mendel, J.M. Generating Fuzzy Rules by Learning from Examples. IEEE Trans. Syst. Man. Cybern.
**1992**, 22, 1414–1427. [Google Scholar] [CrossRef] [Green Version] - Mousavi, S.; Ponnambalam, K.; Karray, F. Reservoir Operation Using a Dynamic Programming Fuzzy Rule–Based Approach. Water Resour. Manag.
**2005**, 19, 655–672. [Google Scholar] [CrossRef] - Mousavi, S.J.; Ponnambalam, K.; Karray, F. Inferring Operating Rules for Reservoir Operations Using Fuzzy Regression and ANFIS. Fuzzy Sets Syst.
**2007**, 158, 1064–1082. [Google Scholar] [CrossRef] - Loucks, D.P.; Dorfman, P.J. An Evaluation of Some Linear Decision Rules in Chance-Constrained Models for Reservoir Planning and Operation. Water Resour. Res.
**1975**, 11, 777–782. [Google Scholar] [CrossRef] - Alizadeh, H.; Mousavi, S.J.; Ponnambalam, K. Copula-Based Chance-Constrained Hydro-Economic Optimization Model for Optimal Design of Reservoir-Irrigation District Systems under Multiple Interdependent Sources of Uncertainty. Water Resour. Res.
**2018**, 54, 5763–5784. [Google Scholar] [CrossRef] - Simonovic, S.P.; Marino, M.A. Reliability Programing in Reservoir Management: 1. Single Multipurpose Reservoir. Water Resour. Res.
**1980**, 16, 844–848. [Google Scholar] [CrossRef] - Simonovic, S.P.; Marino, M.A. Reliability Programing in Reservoir Management: 2. Risk-Loss Functions. Water Resour. Res.
**1981**, 17, 822–826. [Google Scholar] [CrossRef] - Simonovic, S.P.; Marino, M.A. Reliability Programing in Reservoir Management: 3. System of Multipurpose Reservoirs. Water Resour. Res.
**1982**, 18, 735–743. [Google Scholar] [CrossRef] - Fletcher, S.; Ponnambalam, K. Constrained State Formulation for the Stochastic Control of Multireservoir Systems. Water Resour. Res.
**1998**, 34, 257–270. [Google Scholar] [CrossRef] - Fletcher, S.; Ponnambalam, K. Stochastic Control of Reservoir Systems Using Indicator Functions: New Enhancements. Water Resour. Res.
**2008**, 44, 44. [Google Scholar] [CrossRef] - Mahootchi, M. Storage System Management Using Reinforcement Learning Techniques and Nonlinear Models. Ph.D. Thesis, University of Waterloo, Waterloo, ON, Canada, January 2009. [Google Scholar]
- Thomas, H.; Watermeyer, P. Mathematical Models: A Stochastic Sequential Approach; Harvard University Press: Cambridge, MA, USA, 1962. [Google Scholar]
- Archibald, T.; McKinnon, K.; Thomas, L. An Aggregate Stochastic Dynamic Programming Model of Multireservoir Systems. Water Resour. Res.
**1997**, 33, 333–340. [Google Scholar] [CrossRef] [Green Version] - Cervellera, C.; Chen, V.C.; Wen, A. Optimization of a Large-Scale Water Reservoir Network by Stochastic Dynamic Programming with Efficient State Space Discretization. Eur. J. Oper. Res.
**2006**, 171, 1139–1151. [Google Scholar] [CrossRef] - Foufoula-Georgiou, E. Convex Interpolation for Gradient Dynamic Programming. Water Resour. Res.
**1991**, 27, 31–36. [Google Scholar] [CrossRef] - Johnson, S.A.; Stedinger, J.R.; Shoemaker, C.A.; Li, Y.; Tejada-Guibert, J.A. Numerical Solution of Continuous-State Dynamic Programs Using Linear and Spline Interpolation. Oper. Res.
**1993**, 41, 484–500. [Google Scholar] [CrossRef] - Karamouz, M.; Mousavi, S.J. Uncertainty Based Operation of Large Scale Reservoir Systems: Dez and Karoon Experience. J. Am. Water Resour. Assoc.
**2003**, 39, 961–975. [Google Scholar] [CrossRef] - Mousavi, S.J.; Karamouz, M. Computational Improvement for Dynamic Programming Models by Diagnosing Infeasible Storage Combinations. Adv. Water Resour.
**2003**, 26, 851–859. [Google Scholar] [CrossRef] - Philbrick, C.R., Jr.; Kitanidis, P.K. Improved Dynamic Programming Methods for Optimal Control of Lumped-Parameter Stochastic Systems. Oper. Res.
**2001**, 49, 398–412. [Google Scholar] [CrossRef] - Ponnambalam, K.; Adams, B.J. Stochastic Optimization of Multi Reservoir Systems Using a Heuristic Algorithm: Case Study from India. Water Resour. Res.
**1996**, 32, 733–741. [Google Scholar] [CrossRef] - Saad, M.; Turgeon, A. Application of Principal Component Analysis to Long-Term Reservoir Management. Water Resour. Res.
**1988**, 24, 907–912. [Google Scholar] [CrossRef] - Saad, M.; Turgeon, A.; Bigras, P.; Duquette, R. Learning Disaggregation Technique for the Operation of Long-Term Hydroelectric Power Systems. Water Resour. Res.
**1994**, 30, 3195–3202. [Google Scholar] [CrossRef] - Saad, M.; Turgeon, A.; Stedinger, J.R. Censored-Data Correlation and Principal Component Dynamic Programming. Water Resour. Res.
**1992**, 28, 2135–2140. [Google Scholar] [CrossRef] - Stedinger, J.R.; Faber, B.A.; Lamontagne, J.R. Developments in Stochastic Dynamic Programming for Reservoir Operation Optimization. In Proceedings of the World Environmental and Water Resources Congress, Cincinnati, OH, USA, 19–23 May 2013. [Google Scholar]
- Tejada-Guibert, J.A.; Johnson, S.A.; Stedinger, J.R. The Value of Hydrologic Information in Stochastic Dynamic Programming Models of a Multireservoir System. Water Resour. Res.
**1995**, 31, 2571–2579. [Google Scholar] [CrossRef] - Turgeon, A. Optimal Operation of Multireservoir Power Systems with Stochastic Inflows. Water Resour. Res.
**1980**, 16, 275–283. [Google Scholar] [CrossRef] - Turgeon, A. A Decomposition Method for the Long-Term Scheduling of Reservoirs in Series. Water Resour. Res.
**1981**, 17, 1565–1570. [Google Scholar] [CrossRef] - Pereira, M.V.F. Optimal Stochastic Operations Scheduling of Large Hydroelectric Systems. Int. J. Electr. Power Energy Syst.
**1989**, 11, 161–169. [Google Scholar] [CrossRef] - Pereira, M.V.F.; Pinto, L.M.V.G. Multi-Stage Stochastic Optimization Applied to Energy Planning. Math. Program.
**1991**, 52, 359–375. [Google Scholar] [CrossRef] - Poorsepahy-Samian, H.; Espanmanesh, V.; Zahraie, B. Management. Improved Inflow Modeling in Stochastic Dual Dynamic Programming. J. Water Resour. Plan. Manag.
**2016**, 142, 04016065. [Google Scholar] [CrossRef] - Rougé, C.; Tilmant, A. Using Stochastic Dual Dynamic Programming in Problems with Multiple Near-Optimal Solutions. Water Resour. Res.
**2016**, 52, 4151–4163. [Google Scholar] [CrossRef] [Green Version] - Zhang, J.L.; Ponnambalam, K. Stochastic Control for Risk Under Deregulated Electricity Market—A Case Study Using a New Formulation. Can. J. Civ. Eng.
**2005**, 32, 719–725. [Google Scholar] [CrossRef] - Kelman, J.; Stedinger, J.R.; Cooper, L.A.; Hsu, E.; Yuan, S.Q. Sampling Stochastic Dynamic Programming Applied to Reservoir Operation. Water Resour. Res.
**1990**, 26, 447–454. [Google Scholar] [CrossRef] - Mahootchi, M.; Tizhoosh, H.; Ponnambalam, K. Opposition-based reinforcement learning in the management of water resources. In Proceedings of the IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, Honolulu, HI, USA, 1–5 April 2007; pp. 217–224. [Google Scholar]
- Castelletti, A.; Pianosi, F.; Restelli, M. A Multiobjective Reinforcement Learning Approach to Water Resources Systems Operation: Pareto Frontier Approximation in a Single Run. Water Resour. Res.
**2013**, 49, 3476–3486. [Google Scholar] [CrossRef] [Green Version] - Bhattacharya, B.; Lobbrecht, A.; Solomatine, D. Neural Networks and Reinforcement Learning in Control of Water System. J. Water Resour. Plan. Manag.
**2003**, 129, 458–465. [Google Scholar] [CrossRef] - Castelletti, A.; Galelli, S.; Restelli, M.; Soncini-Sessa, R. Tree-Based Reinforcement Learning for Optimal Water Reservoir Operation. Water Resour. Res.
**2010**, 46, W09507. [Google Scholar] [CrossRef] - Pianosi, F.; Castelletti, A.; Restelli, M. Tree-Based Fitted Q-Iteration for Multi-Objective Markov Decision Processes in Water Resource Management. J. Hudroinform.
**2013**, 15, 258–270. [Google Scholar] [CrossRef] [Green Version] - Bertoni, F.; Giuliani, M.; Castelletti, A. Integrated Design of Dam Size and Operations via Reinforcement Learning. J. Water Resour. Plan. Manag.
**2020**, 146, 04020010. [Google Scholar] [CrossRef] - Bellman, R. Dynamic Programming and Lagrange Multipliers. Proc. Natl. Acad. Sci. USA
**1956**, 42, 767. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Ponnambalam, K.; Adams, B.J. Experiences with integrated irrigation system optimization analysis. In Proceedings of the Irrigation and Water Allocation (Proc., Vancouver Symposium); IAHS: Wallingford, UK, 1987; pp. 229–245. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998; Volume 1. [Google Scholar]
- Watkins, C.J.C.H.; Dayan, P. Q-Learning. Mach. Learn.
**1992**, 8, 279–292. [Google Scholar] [CrossRef] - Robbins, H.; Monro, S. A Stochastic Approximation Method. Ann. Math. Stat.
**1951**, 22, 400–407. [Google Scholar] [CrossRef] - Mahootchi, M.; Ponnambalam, K.; Tizhoosh, H. Operations Optimization of Multireservoir Systems Using Storage Moments Equations. Adv. Water Resour.
**2010**, 33, 1150–1163. [Google Scholar] [CrossRef] - Gosavi, A. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning; Springer: Berlin/Heidelberg, Germany, 2014; Volume 55. [Google Scholar]

**Figure 5.**Examples for the probability density functions of inflows to reservoirs [19].

**Figure 6.**The sub-problems of PAP using the multilevel approximation-dynamic programming (MAM-DP) method [28].

**Figure 7.**The sub-problems of PAP using the aggregation–decomposition dynamic programming (AD-DP) method [28].

Reservoir | Periods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |

1 | 0.6 | 1 | 1 | 0.05 | 0.2 | 0.2 | 0 | 0 | 0 | 0 | 0 | 0 |

2 | 0.8 | 0.9 | 1 | 0.4 | 0.4 | 0.7 | 0.8 | 0.8 | 0.8 | 0 | 0 | 0 |

3 | 0.1 | 0.25 | 0.3 | 0.22 | 0.25 | 0.4 | 0.5 | 0.4 | 0.4 | 0.3 | 0.2 | 0.2 |

4 | 0.55 | 0.9 | 1 | 0.22 | 0.28 | 0.42 | 0.58 | 0.62 | 0.44 | 0 | 0 | 0 |

5 | 0.25 | 0.35 | 0.35 | 0.25 | 0.35 | 0.3 | 0.3 | 0.3 | 0 | 0 | 0.2 | 0.25 |

Res. 1 | Res. 2 | Res. 3 | Res. 4 | Res. 5 | |
---|---|---|---|---|---|

Res. 1 | 173.62 | 123.9 | 49.72 | - | - |

Res. 2 | - | 115.4 | - | 57.7 | 57.7 |

Res. 3 | - | - | 66.67 | - | - |

Res. 4 | - | - | - | 105.12 | - |

Res. 5 | - | - | - | - | 49.23 |

Res. 1 | Res. 2 | Res. 3 | Res. 4 | Res. 5 | |
---|---|---|---|---|---|

Res. 1 | 1.0 | 0.8 | 0.8 | −0.1 | 0.9 |

Res. 2 | 0.8 | 1.0 | 0.8 | 0.1 | 0.8 |

Res. 3 | 0.8 | 0.8 | 1.0 | 0.3 | 0.7 |

Res. 4 | −0.1 | 0.1 | 0.3 | 1.0 | 0.0 |

Res. 5 | 0.9 | 0.8 | 0.7 | 0.0 | 1.0 |

Parameters | Values | ||||
---|---|---|---|---|---|

B | 0.1 | 0.3 | 0.5 | 0.7 | 0.9 |

Initial exploration factor $\left({\epsilon}_{1}\right)$ | 1 | 1 | 1 | 1 | 1 |

Final exploration factor $\left({\epsilon}_{2}\right)$ | 0 | 0.1 | 0.3 | 0.5 | 0.7 |

**Table 5.**The number of the same sets of parameters according to estimated-by-artificial neural network (ANN) and actual (obtained-by-Q learning) performances for top-$N$ set.

$\mathit{N}$ | |||||
---|---|---|---|---|---|

$N$ | 3 | 4 | 5 | 10 | 15 |

The number of the same sets in top n sets | 1 | 3 | 3 | 9 | 14 |

**Table 6.**Top three parameter sets obtained by the ANN-based parameter tuning approach for AD-RL algorithm applied to PAP.

Rank | Performance Criteria | ε_{2} | B. |
---|---|---|---|

1 | 1347.447 | 0.3 | 0.9 |

2 | 1346.325 | 0.4 | 0.9 |

3 | 1345.93 | 0.2 | 0.9 |

Methods | Ave. | Std. |
---|---|---|

MAM-DP * | 1432.2 | 249.6 |

AD-RL * | 1353.9 | 224.1 |

FP2 | 1289.5 | 230.4 |

FP1 | 1262.5 | 211 |

AD-DP * | 1268.0 | 231.6 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Hooshyar, M.; Mousavi, S.J.; Mahootchi, M.; Ponnambalam, K.
Aggregation–Decomposition-Based Multi-Agent Reinforcement Learning for Multi-Reservoir Operations Optimization. *Water* **2020**, *12*, 2688.
https://doi.org/10.3390/w12102688

**AMA Style**

Hooshyar M, Mousavi SJ, Mahootchi M, Ponnambalam K.
Aggregation–Decomposition-Based Multi-Agent Reinforcement Learning for Multi-Reservoir Operations Optimization. *Water*. 2020; 12(10):2688.
https://doi.org/10.3390/w12102688

**Chicago/Turabian Style**

Hooshyar, Milad, S. Jamshid Mousavi, Masoud Mahootchi, and Kumaraswamy Ponnambalam.
2020. "Aggregation–Decomposition-Based Multi-Agent Reinforcement Learning for Multi-Reservoir Operations Optimization" *Water* 12, no. 10: 2688.
https://doi.org/10.3390/w12102688