Comparison of Two Convergence Criterion in the Optimization Process Using a Recursive Method in a Multi-Reservoir System

: Stochastic dynamic programming (SDP) is an optimization technique used in the operation of reservoirs for many years. However, being an iterative method requiring considerable computational time, it is important to establish adequate convergence criterion for its most effective use. Based on two previous studies for the optimization of operations in one of the most important multi-reservoir systems in Mexico, this work uses SDP, centred on the interest in the convergence criterion used in the optimization process. In the ﬁrst trial, following the recommendations in the literature consulted, the difference in the absolute value of two consecutive iterations was taken and compared against a set tolerance value and a discount factor. In the second trial, it was decided to take the squared difference of the two consecutive iterations. In each of the trials, the computational time taken to obtain the optimal operating policy was quantiﬁed, along with whether the optimal operating policy was obtained by meeting the convergence criterion or by reaching the maximum number of iterations. With each optimization policy, the operation of the system under study was simulated and four variables were taken as evaluators of the system behaviour. The results showed few differences in the two operation policies but notable differences in the computation time used in the optimization process, as well as in the fulﬁlment of the convergence criterion.


Introduction
Resources must be used in such a way that the balance between their use and sustainability is maintained. As water is a necessary component for the continuation of all forms of life on Earth, any effort to achieve this balance is important. In the management of reservoirs, this means increasing the benefits for the population, such as access to services of drinking water, electricity and the development of economic and recreational activities. Nonetheless, external factors such as climatic change increase the occurrence of hurricanes and tropical storms, bringing unusual rainfall or lengthy droughts that affect the functioning of reservoirs, leading to spills and deficiency situations, flooding downstream, damage to reservoir systems, loss of human and animal life, crop damage, disease, water pollution and scarcity of water.
In general, optimizing techniques and quicker computing equipment have been beneficial in reservoir management. The combination of both tools helps in integrating the physical constraints of the system, penalizations for spillage and water scarcity, resource maximization, more accurate models and forecasting and obtaining solutions quickly.
One such technique is dynamic programming (DP), developed by [1] and applied in a variety of fields, including the optimization of single and multi-reservoir systems. Examples of studies on this subject in other parts of the world include: [2] (China); [3] (Iran); [4] (Taiwan); [5], (India); [6] (Sri Lanka); [7] (Australia); [8] (USA); [9] (Germany); [10] (Korea); [11] (Spain); [12] (Africa: Zambezi river basin); [13] (Iraq); [14] (Norway). However, since DP is an iterative method, there is a risk of not finding unique solutions and thus becoming trapped in eternal cycles that only find solutions when the number of iterations is reached and, because it is also an algorithm that requires great computational resources, it is critical to properly establish the stopping mechanism. With this in mind, the present study examines two stopping trials in optimizing operational rules, in a reservoir cascade system, using the DP algorithm in its stochastic variation (SDP).
We identified studies on in a literature review, including [2,[15][16][17][18][19][20][21][22][23]. These works helped us to establish the two trials described in this study. The literature review and several recent works carried out at the UNAM Engineering Institute revealed the importance of the convergence criterion used when seeking optimum solutions for multi-reservoir systems with a large storage capacity. In this study, two trials are analysed in order to achieve convergence that guarantees stability of the solution and to reduce computational time.
The study is divided thus: Section 2 presents the general features of DP and the two stopping strategies defined to obtain the operation rule. Section 3 describes the study site and explains the process conditions to optimize and simulate the reservoir system. In Section 4 the results obtained are presented, then in Section 5 conclusions are given.

Background and Theoretical Framework
The works of [24,25], which have studied a multi-reservoir system by determining optimal policies, using as a convergence criterion that the absolute value of the differences between two consecutive iterations is less than a given tolerance value, will be taken as a starting point. Both studies use the dynamic programming methodology, a technique developed by [26] that can be used if and only if the solutions of the problem to be solved constitute a series of decisions. The calculations are carried out in a recursive manner [27,28]. The details of the theory for both the deterministic and stochastic cases are widely discussed in the literature [26,[29][30][31][32].
To obtain the operation policies the following considerations were taken: (a) Value function.
The goal was to maximize the expected annual energy generation from the system and avoid spills and events where the demands are not met (called deficit events) where E denotes expectation, T number of periods within the annual cycle (stages), EG energy generation (GWh), C penalization coefficient, OE occurred event (spill or deficit), and nr number of reservoirs.
where S min minimum storage of reservoir, S max maximum storage of reservoir, and S reservoir storage.

b.2 Release
where R min = minimum release from the reservoir, R max maximum release from the reservoir, R release from the reservoir (c) State transformation equations. The equations are obtained according to the principle of continuity [32] where S e is storage at the end of the stage, S b storage at the beginning of the stage, I inflow to reservoir i, r release from reservoir i, and O losses (principally evaporation). It should be taken into account that, when the system operates in cascade, the downstream reservoir will have as additional income the volumes spilled by that which precedes it.
(d) Recursive equations for a multi-reservoir system [25,32] F n where k is the storage state space; l decision space; p inflow state space during period j; q inflow state space during period j + 1; F n l (k, p) accumulated expected energy generation by optimal operation of the system over the last n stages; B benefit for energy generation when system changes from state k i to state l i when inflow class is p i in period j; JP joint transition probabilities of inflows.
The disadvantage of DP is the high cost of computational resources it demands; to avoid this, the technique used in II UNAM splits the solution procedure into two parts: only the expected benefit is calculated for each stage (first term of Equation (5)), which is repeated from one year to another, in the first part; the accumulated benefit up to the considered stage and its optimum value are determined, for which a very large number of years is proposed (which may correspond to the normal life of the system), in the second part (second term of Equation (5)). In the initial stage, values of zero are assigned to the optimum benefits, the stages are started in the opposite direction to time and the equations are solved by iterating until the differences between two consecutive years meet an established tolerance that guarantees convergence and stability for the solution. Once this is achieved, the optimal release policy for each dam is stored, along with its respective benefit [24].This process is illustrated in Figure 1: where Rmin = minimum release from the reservoir, Rmax maximum release from the reservoir, R release from the reservoir (c) State transformation equations.
The equations are obtained according to the principle of continuity [32] = where Se is storage at the end of the stage, Sb storage at the beginning of the stage, I inflow to reservoir i, r release from reservoir i, and O losses (principally evaporation). It should be taken into account that, when the system operates in cascade, the downstream reservoir will have as additional income the volumes spilled by that which precedes it.
(d) Recursive equations for a multi-reservoir system [25,32] ( , ) = { , , , + ∑ , +1 where k is the storage state space; l decision space; p inflow state space during period j; q inflow state space during period j + 1; ( , ) accumulated expected energy generation by optimal operation of the system over the last n stages; B benefit for energy generation when system changes from state ki to state li when inflow class is pi in period j; JP joint transition probabilities of inflows.
The disadvantage of DP is the high cost of computational resources it demands; to avoid this, the technique used in II UNAM splits the solution procedure into two parts: only the expected benefit is calculated for each stage (first term of Equation (5)), which is repeated from one year to another, in the first part; the accumulated benefit up to the considered stage and its optimum value are determined, for which a very large number of years is proposed (which may correspond to the normal life of the system), in the second part (second term of Equation (5)). In the initial stage, values of zero are assigned to the optimum benefits, the stages are started in the opposite direction to time and the equations are solved by iterating until the differences between two consecutive years meet an established tolerance that guarantees convergence and stability for the solution. Once this is achieved, the optimal release policy for each dam is stored, along with its respective benefit [24].This process is illustrated in Figure 1:

Convergence Criterion
According to the literature, an incremental solution strategy based on an iterative method will be effective if, and only if, the selection of the convergence criterion is adequate for the completion of the iteration process. It is recommended to keep in mind that not complying with the convergence criterion may lead to not quite accurate results, a very strict tolerance criterion will have an increased computation time and the solution may not need such thoroughness. In this study, two convergence criteria were formulated to identify the optimal operation policies for a multi-reservoir system, which consists of four reservoirs. The recursion equations of the stochastic dynamic programming method are applied until the criterion is met or the number of iterations corresponding to the normal life of the system is reached.
From [24,25,28] the convergence criterion used was where t is time (one full year, in this case); V(X t ) denotes value function and µ is tolerance. The value of µ chosen was fixed at 10 −3 . However, according to the literature, adding a discount factor that is a real number between 0 and 1 is recommended. Then, a typical convergence criterion is [18,33] where β is a discount factor. Using Equation (7) we define two trials to analyse how different are the optimized values for the policy, the spent time and the behaviour of the chosen system during the simulation using the historic record of inflows. The trials are defined as follows: 1.
Trial 1: The first stopping strategy involves adding the absolute value of the differences between two subsequent values, for all possible states, then taking the average and comparing it to the convergence criterion at the end of two cycles where D is a vector in which the differences of two consecutive cycles are accumulated, T is the total number of groups of months in which the year was divided, and tns is the total number of states, considering the two reservoirs.
2 Trial 2: Instead of taking the absolute value, the second strategy was to square the differences The iterative process is carried out until one of the defined conditions is met: by reaching the maximum number of iterations, or by meeting the convergence criterion. The results of the two trials are shown in Section 4.

Multi-Reservoir System
The reservoir system examined is on the Grijalva River in southeast Mexico ( Figure 2). The river begins in Guatemala and flows through the state of Chiapas in Mexico, then a for a further 480 km to its mouth in the Gulf of Mexico [34]. The administrative bounds of the basin include the states of Tabasco, Chiapas and small portions of Campeche, covering a total area of 52,348 km 2 , part of the "Grijalva-Usumacinta" hydrological region 30. generation, agricultural irrigation and domestic water supply, fishing, national and international tourism and water sports. (c) Malpaso was built between 1959 and 1964. Its main uses are electricity generation and flood control. (d) Peñitas is located in the municipality of Ostuacán, Chiapas. It was completed in 1987.
The production of electricity is its primary function.   Between 1959 and 1987, four dams were built on the river Grijalva. Figure 3 shows the layout of the dams that make up the system and their storage capacities.
international tourism and water sports. (c) Malpaso was built between 1959 and 1964. Its main uses are electricity generation and flood control. (d) Peñitas is located in the municipality of Ostuacán, Chiapas. It was completed in 1987.
The production of electricity is its primary function.   The production of electricity is its primary function.
With normal storage capacities of 13,169 and 9600 hm 3 , respectively, La Angostura and Malpaso dams are the largest of the complex [25]. In terms of electricity generation, the installed capacity of the system (3907 MW) represents 40.3% of the national hydroelectric capacity and 52% of the energy generated by Mexican hydroelectric plants [36].

Optimization and Simulation Processes
The optimization process uses a series of assumptions in order to obtain the operation policies of the dams. It is important to simplify the management of the four-dam system of the Grijalva River, to optimize the two main dams. There is a great difference in the storage capacities of La Angostura and Malpaso and of Chicoasén and Peñitas; if we take this into account, then the smaller dams should be considered as diversion dams only, and their operation policy will consist of releasing the volumes that pass through them, taking into account physical restrictions. The volumes of water entering Chicoasén are added to those of Malpaso (which is downstream) and the energy generated by Chicoasén and Peñitas is considered by adding their hydraulic heads to La Angostura and Malpaso, respectively. This simplification significantly reduces the complexity of the system but maintains the actual operating conditions of the four reservoirs.
Once the policies are obtained, the operation of the multi-reservoir system is simulated using them, the physical characteristics of the system and the record of historic inflows.
The conditions for the optimization and simulation processes are described in [24,25,28], and the most important are listed in summaries 2 and 3. Summary 2. Optimization process 1.
Six stages were defined to group the 12 months of the year (Table 1)  Stage 1 has 2 months: Nov and Dec, therefore the minimum release will be The same procedure is applied to obtain the maximum release, but taking the maximum monthly turbine discharge now, i.e., 2825 hm 3 month = 4.708 ∆V month .
Then for 2 months, the maximum release will be R max = 4.708 ∆V month × 2 months = 9.416 ∆V = 10 ∆V.
In the case of Malpaso, to obtain the minimum and maximum releases for a stage that has, for example, a single month Following this process, the values shown in Table 1 are obtained. 4 To take into account the randomness of the inflows, they are defined by a probability density function. In Figures 4 and 5 the probabilities (JP in Equation (5)) associated with the historic inflows are given for each stage and each dam. Following this process, the values shown in Table 1 are obtained. 4. To take into account the randomness of the inflows, they are defined by a probability density function. In Figure 4 and Figure 5 the probabilities (JP in Equation (5)) associated with the historic inflows are given for each stage and each dam.
5. For both trials, the discount factor was fixed at 0.5.
6. The maximum number of iterations is 100.
7. To obtain the solution vectors, i.e., the optimum releases for each stage and each dam, the time was quantified.   2. The simulation process begins in January, ending in December.
3. Using the initial storage, the corresponding time, and the file with the optimal release for each time and for each storage level of the reservoirs, the release is obtained.
4. The operation of the system is based on the principle of continuity, i.e., the new filling level is calculated by adding the inflows and subtracting the releases and other outflows (mainly evaporation) from the initial storage for each time interval.
5. The storage constraints are evaluated and it is determined whether the system is in a no-spill, spill, or deficit condition with the new storage value.
6. The process continues until the number of years analysed is reached.
7. The initial storage, spillage, deficit volumes and average energy generated were analysed. These variables were the evaluation parameters of the optimal policy.

Evaluating the Convergence Criterion
The two trials were performed on a computer with an Intel (R), with a 2.20 GHz processor, 16 GB RAM, and a 64-bit operating system. Trial 1 (a) The optimal policy was obtained when the number of iterations reached its limit (100 in this case); the computation time was 34 s without achieving the convergence criterion (fixed at 0. 5 × 10 −4 ). (b) The differences between each iteration decrease rapidly, but the values oscillate between 10 −2 and 10 −3 from the fifth iteration onwards, thereafter the convergence process is trapped in a loop that repeats values until the maximum number of iterations is reached (Figure 6).

5
For both trials, the discount factor was fixed at 0.5. 6 The maximum number of iterations is 100. 7 To obtain the solution vectors, i.e., the optimum releases for each stage and each dam, the time was quantified. Summary 3. Simulation assumptions 1.
Year 1 (1959) has the dams full, with 10,000 hm 3 at La Angostura and 7500 hm 3 at Malpaso.

2.
The simulation process begins in January, ending in December.

3.
Using the initial storage, the corresponding time, and the file with the optimal release for each time and for each storage level of the reservoirs, the release is obtained.

4.
The operation of the system is based on the principle of continuity, i.e., the new filling level is calculated by adding the inflows and subtracting the releases and other outflows (mainly evaporation) from the initial storage for each time interval. 5.
The storage constraints are evaluated and it is determined whether the system is in a no-spill, spill, or deficit condition with the new storage value. 6.
The process continues until the number of years analysed is reached. 7.
The initial storage, spillage, deficit volumes and average energy generated were analysed. These variables were the evaluation parameters of the optimal policy.

Evaluating the Convergence Criterion
The two trials were performed on a computer with an Intel (R), with a 2.20 GHz processor, 16 GB RAM, and a 64-bit operating system. Trial 1 (a) The optimal policy was obtained when the number of iterations reached its limit (100 in this case); the computation time was 34 s without achieving the convergence criterion (fixed at 0.5 × 10 −4 ). (b) The differences between each iteration decrease rapidly, but the values oscillate between 10 −2 and 10 −3 from the fifth iteration onwards, thereafter the convergence process is trapped in a loop that repeats values until the maximum number of iterations is reached ( Figure 6). Trial 2 (a) The optimal policy was obtained in 5 s, meeting the convergence criteri iteration. Figure 7 shows the details of the iteration cycle. (b) The difference between the first two iterations is significant at the be decreases rapidly, and each iteration maintains a negative gradient tha ing the optimal policy and concluding the iterative process by meetin gence criterion. Analysing the differences of the solution vectors that give rise to Figur following can be observed: (a) In both trials the differences are gradually being reduced, towards a s Trial 2 (a) The optimal policy was obtained in 5 s, meeting the convergence criterion in the sixth iteration. Figure 7 shows the details of the iteration cycle. Trial 2 (a) The optimal policy was obtained in 5 s, meeting the convergence criteri iteration. Figure 7 shows the details of the iteration cycle. (b) The difference between the first two iterations is significant at the beg decreases rapidly, and each iteration maintains a negative gradient tha ing the optimal policy and concluding the iterative process by meetin gence criterion. Analysing the differences of the solution vectors that give rise to Figur following can be observed: (a) In both trials the differences are gradually being reduced, towards a s ever, in trial 1 this trend is seen only until iteration 9; from there, the into a cycle that repeats values systematically, i.e., it reaches a point w possible to improve the solution and the differences between two con tions do not satisfy the tolerance, without reaching the maximum nu tions. (b) In trial 2, taking the squared differences amplifies the larger difference (b) The difference between the first two iterations is significant at the beginning, but it decreases rapidly, and each iteration maintains a negative gradient that leads to finding the optimal policy and concluding the iterative process by meeting the convergence criterion.
Analysing the differences of the solution vectors that give rise to Figures 6 and 7, the following can be observed: (a) In both trials the differences are gradually being reduced, towards a solution. However, in trial 1 this trend is seen only until iteration 9; from there, the method falls into a cycle that repeats values systematically, i.e., it reaches a point where it is not possible to improve the solution and the differences between two consecutive iterations do not satisfy the tolerance, without reaching the maximum number of iterations. (b) In trial 2, taking the squared differences amplifies the larger differences between the values of one iteration and the next. However, as these are always decreasing, they are better than those of trial 1, as they are increasingly smaller and thus reach one of the ways of stopping the algorithm, complying with the established tolerance value. (c) The minimum values that trial 1 manages remain in the range of thousandths, while the tolerance was set at a value in the range of ten thousandths. Therefore, here it is worth noting that if the tolerance is relaxed a little, trial 1 would manage to reach this, although trial 2 would be better, as the computational time is less.
A quick trial was designed with new conditions in the optimization process to see what the times would be for the two trials: seven stages were defined, a∆V of 200 hm 3 was considered (with this we go from having 22 states in La Angostura to 65 and in Malpaso from 16 to 46), the values for minimum releases were from 3∆V to 12∆V in La Angostura= and from 4∆V to 18∆V in Malpaso, and maximum releases were 64∆V for La Angostura and 84∆V for Malpaso. New data files were prepared and it was found that trial 1 ran for 1 h 12 min 26 s, without managing to meet the tolerance, while trial 2 arrived at a solution in 7 min 40 s, meeting the tolerance at iteration 11.
This trail shows that, if any changes to the system conditions are required to adjust the operating rule, the way the calculation of the differences in the solution vector is set up is extremely important. This quick trial was not carried out as a simulation in this work, but is only given as a reference on how the computational resource demands and execution time grow as the value for discretising the state variables that the dynamic programming algorithm occupies becomes finer and how the criterion for calculating the differences in the solution vector in two consecutive iterations impacts on it.

Evaluating the System Operation with the Optimal Policy
The optimal policy obtained for each trial was analysed and some differences were found between them, shown in Figure 8. The results are presented as a state-release matrix for each stage. Each matrix was compared and, if the release was different, then the cell was given a logical value of false (F). Figure 8 shows that there are only different release values in stages 4 (states 10 in La Angostura and 7 for Malpaso), 5 (states 10 and 6; 11 and 4; 12 and 6; 13 and 5), and 6 (state 20 in La Angostura and from 1 to 16 in Malpaso).
To see if the operating policies of the trials produce significant changes in the behaviour of the system, the historical record of income volumes from 1959 to 2020 was used for the two policies. In general, the simulation process showed: Similarities between trial 1 and trial 2:  T  T  T  T  T  20  F  F F F F F F F F F  F  F  F  F  F F  21  T  T T T T T T T T T  T  T  T  T  T  T  21  T  T T T T T T T T T  T  T  T  T  T T  22  T  T T T T T T T T T  T  T  T  T  T  T  22  T  T T T T T T T T T  T  T  T  T 3 in trial 1, and 4997 hm 3 in trial 2.
The 4 main variables were evaluated: the minimum value of the initial storage, spillage and deficit volumes, and the average energy. The results are given in Table 2. From Table 2 it can be seen that

Discussion of Results
The dynamic programming algorithm has been proven to be a robust methodology for application in the study of operating rules for large-capacity storage systems. The case analysed here is of great complexity, as the four reservoirs of the Grijalva river hydroelectric system operate in cascade. In addition, its operation is of great importance nationally, as it is one of the biggest contributors to the generation of electricity. However, the storage levels must be kept within ranges that mean operation is as safe as possible and that extreme weather events do give rise to overflows that cause flooding in downstream populations. It is therefore vital to have tools that allow the testing of different operating conditions, where results are obtained as quickly as possible, to facilitate better management.
From previous works it was seen that the optimization process, in order to obtain the operation rule, often entered a cyclic series, repeating values without achieving satisfaction of the tolerance criterion and always arriving at the maximum number of iterations to stop the process. This led us to focus on how to calculate the differences of the solution vector between two consecutive iterations. A trial was proposed to make this calculation, taking the squared differences (instead of the absolute value), thus magnifying the larger differences, but also flattening the process more rapidly when the differences were small.
With the two trials defined it was seen that the optimal policy of each has few changes and the operation of the system with both is practically the same in the simulation of the system behaviour over the 62 years of historical records used. However, the condition chosen to stabilize the process and achieve convergence has a great impact on the computation time, mainly in the optimization process.
If smaller values are used to discretize the state variables, this will increase the computation time required to find the solution, therefore the choice of the stopping condition is more important, as seen in the results of the quick trial carried out with a finer discretisation interval (a∆V of 200 hm 3 instead of 600 hm 3 ), defining one more stage and recalculating the variables involved in the optimisation process by making these changes.
It seems that accumulating the squared differences during a complete cycle (one year) is much better than taking only their absolute value. For the case analyzed, the convergence criterion was met in only six iterations by taking the squared differences, thus avoiding the loop of the first trial. However, it is necessary to test cases that imply a greater demand for computational resources to see if the iterative process is completed by meeting the convergence criterion proposed in this study, and thus avoid the eternal loops inherent in recursive methods.

Conclusions
In this paper, two ways to obtain the differences in the solution vector were analysed. They were compared against a defined criterion to achieve convergence and stop the iterative process of the Bellman equations for dynamic stochastic programming. It was seen that taking the squared differences, instead of the absolute value, significantly reduces computation times and the convergence condition is accurately reached.
In future work, we hope to explore the effect of relaxing the tolerance value with the discount factor and to see if this shortens the solution times in more computationally demanding cases. It is also hoped to see if it is still possible to stop the process by satisfying the tolerance criterion, calculating the differences in the solution vector of two consecutive iterations with the square of them.
An advantage of the application reported in this study is that the software does not require large, expensive computer hardware, making it accessible to the federal agencies in charge of managing water storage systems in Mexico.
With the results presented, numerical modelling under similar conditions can be carried out more efficiently, allowing resources to be invested in more comprehensive analyses.