A Mixed-Integer and Asynchronous Level Decomposition with Application to the Stochastic Hydrothermal Unit-Commitment Problem

: Independent System Operators (ISOs) worldwide face the ever-increasing challenge of coping with uncertainties, which requires sophisticated algorithms for solving unit-commitment (UC) problems of increasing complexity in less-and-less time. Hence, decomposition methods are appealing options to produce easier-to-handle problems that can hopefully return good solutions at reasonable times. When applied to two-stage stochastic models, decomposition often yields subproblems that are embarrassingly parallel. Synchronous parallel-computing techniques are applied to the decomposable subproblem and frequently result in considerable time savings. However, due to the inherent run-time di ﬀ erences amongst the subproblem’s optimization models, unequal equipment, and communication overheads, synchronous approaches may underuse the computing resources. Consequently, asynchronous computing constitutes a natural enhancement to existing methods. we propose a novel extension of the asynchronous level decomposition to solve stochastic hydrothermal UC problems with mixed-integer variables in the ﬁrst stage. In addition, we combine this novel method with an e ﬃ cient task allocation to yield an innovative algorithm that far outperforms the current state-of-the-art. We provide convergence analysis of our proposal and assess its computational performance on a testbed consisting of 54 problems from a 46-bus system. Results show that our asynchronous algorithm outperforms its synchronous counterpart in terms of wall-clock computing time in 40% of the problems, providing time savings averaging about 45%, while also reducing the standard deviation of running times over the testbed in the order of 25%. that the asynchronous method is further enhanced by implementing a dynamic-task-allocation strategy. Author Contributions: Conceptualization, B.C., E.C.F., and W.d.O.; methodology, B.C., E.C.F., and W.d.O.; software, B.C.; validation, B.C.; formal analysis, E.C.F. and W.d.O.; investigation, B.C., E.C.F., and W.d.O.; resources, B.C. and E.C.F.; data curation, B.C.; writing—original draft preparation, B.C., and


Introduction
The unit-commitment (UC) problem aims at determining the optimal scheduling of generating units to minimize costs or maximize revenues while satisfying local and system-wide constraints [1]. In its deterministic form, UC still poses a challenge to operators and researchers due to the large sizes of the systems and the increasing modeling details necessary to represent the system operation. For instance, in the Brazilian case, the current practice is to set a limit of 2 h for the solution of the deterministic UC [2], while the Midcontinent Independent System Operator (MISO) sets a time limit of 20 min for its UC [3]. (Note that the Brazilian system and the MISO are different from a physical, as well as from a market-based, viewpoint, but the problems being solved in these two cases share the same classical concept of the UC.) Nonetheless, the growing presence of intermittent generation has constraints. The uncertainties in wind and inflows are represented by a finite set of scenarios, S, and the decisions are made in two stages. At the first stage, the ISO decides on the commitment of units, whereas, at the second stage, the operator determines the dispatch according to the random-variable realization. Full details on the considered stochastic hydrothermal unit-commitment (SHTUC) are given shortly. For presenting our approach, which is not limited to (stochastic) unit-commitment (UC) problems, we adopt the following generic formulation.
x ∈ X, Tx + Wy s ≤ h s , y s ∈ Y s , s ∈ S In this formulation, the n-dimensional vector x represents the first-stage variables with associated cost-vector, c. The second-stage variables, y s , and their associated costs, q s , depend on the scenario, s ∈ S. The cost vector, q s , is assumed to incorporate the positive probability of scenario s. The firstand second-stage variables are coupled by constraints Tx + Wy s ≤ h s : T is the technology matrix; and W and h s are, respectively, the recourse matrix and a vector of appropriate dimensions. While X ∅ is a compact possibly nonconvex, the scenario-dependent set Y s is a convex polyhedron.
As previously mentioned, depending on the UC problem and number of scenarios, the mixed-integer linear programming (MILP) Problem (1) cannot be solved directly by an off-the-shelf solver. The problem is thus decomposed by making use of the recourse functions.
It is well-known that x → Q s (x) is a non-smooth convex function of x. If the above subproblem has a solution, then a subgradient of Q s at x can be computed by making use of a Lagrange multiplier, π s , associated with a constraint, W s y s ≤ h s − T s x: −T T s π s ∈ ∂Q s (x). On the other hand, if the recourse function Q s is infeasible, then the point x can be cutoff by adding a feasibility cut [5].
Let P be a partition of S into w subsets: P = {P 1 , . . . , P w }, with P j ∅ for all j ∈ {1, . . . ,w}, and P j ∩ P i = ∅ for i j. By defining f j (x) := s∈P j Q s (x), Problem (1) can be rewritten as In our notation, w stands for the number of workers evaluating the recourse functions. The workers j ∈ {1, . . . ,w} are processes running on a single machine or multiple machines. Likewise, we define a master process-hereafter referred to only as master-to solve the master program (which is defined shortly).

The Mixed-Integer and Asynchronous Level Decomposition
For every point x k , where k represents an iteration counter, worker j receives x k and provides us with the first-order information on the component function f j : the value of the function f j (x k ) and a subgradient [23] x − x k approximates f j (x) from below for all x. By gathering iteration indices into sets J j ⊂ {1, 2, . . . , k} along with the iterations at which f j were evaluated, we can construct individual cutting-plane models for functions f j , with j ∈ {1, . . . ,w}: These models define-together with a stability centerx k , a level parameter f lev k ∈ , and a given norm · 2 -the following master program (MP) Algorithms 2020, 13, 235

of 16
At iteration k, an MP solution is denoted by x k+1 . If any Q s is infeasible at x k+1 , then a feasibility cut is added to the MP. We skip further details on this matter, since it is a well-known subject in the literature of two-stage programming [5]. On the other hand, if x k+1 (sent to a work j) is feasible for all recourse functions, Q s , the model f j in the MP is updated. The improvement in the model f j is possibly based on outdated iterate x a(j) , where a(j) < k is the iteration index of the anterior information provided by worker j. We care to mention that the MP can be infeasible itself depending on the level parameter f lev k . Due to the convexity of the involved functions, if the MP is infeasible, then f lev k is a valid lower bound, f low k , on f * [30]. Without coordination, there is no reason for all workers to be called upon the same iterate. This fact precludes the computation of an upper bound, f up k , of f * . Algorithm 2 in Reference [31] deals with this situation without resorting to coordination techniques, but it requires more assumptions on the functions f j : upper bounds on their Lipschitz constants should be known. Since we do not make this assumption, we will need scarce coordination akin to Algorithm 3 of Reference [31] for computing upper bounds on f * . As in Reference [31], the coordination iterates are denoted by x k . Assuming that all workers eventually respond (after an unknown time), the coordination allows them to compute the full value, f (x k ), and a subgradient, g ∈ ∂ f (x k ), at the coordination iterate. The function value is used to update the upper bound, f up k , as usual for level methods; the subgradient is used to update the bound L on the Lipschitz constant of f.
In our algorithm below, the coordination is implemented by two vectors of Booleans: to-coordinate and coordinating. The role of to-coordinate[j] is to indicate to the master that worker j will evaluate f j on the new coordination point x k ; (at that moment, to-coordinate[j] is set to false, and coordinating[j] is set to true). Similarly, coordinating[j] indicates to the master that worker j is responding to a coordination step, which is used to update the upper bound. When a worker j responds, it is included in the set A of available workers. If all workers are busy, then A = ∅. Our algorithm mirrors as much as possible Algorithm 3 of Reference [31], but contains some important specificities to handle (i) mixed-integer feasible sets and (ii) extended real-valued objective functions (we do not assume that f (x) is finite for all x ∈ X). To handle (ii), we furnish our algorithm with a feasibility check (and addition of cuts), and for (i) we not only use a specialized solver for the MP but also change the rule for scarce coordination. The reason is that the rule of Reference [31] is only valid in the convex setting. Under nonconvexity, the coordination test x k − x k−1 < α L ∆ k−1 (with α ∈ (0, 1) and L ≥ g i , i = 1, . . . , k) implies that the following inequality (important for the convergence analysis) is jeopardized: In the algorithm below, coordination is triggered when (5) is not satisfied and all workers have already responded on the last coordination iterate (i.e., rr = 0, where rr stands for "remaining to respond").
The assumption that the algorithm starts with a feasible point is made only for the sake of simplicity. Indeed, the initial point can be infeasible, but, in this case, Step 3 must be changed to ensure that the first computed feasible point is a coordination iterate. For the problem of interest, the feasibility check performed at line 45 amounts to verifying if f (x k+1 ) < ∞. In our SHTUC, the feasibility check comprises an auxiliary problem for verifying if ramp-rate constraints would be violated by x k+1 and an additional auxiliary problem for checking if reservoir-volume bounds would be violated. Both problems are easily reduced to small linear-programming problems that can be solved to optimality in split seconds by off-the-shelf solvers.  (5) does not hold and rr = 0 then 4. x k ← x k , rr ← w, f ←c T x k and g ← c 5. for all j ∈ A do 6. to_coordinate [

Convergence Analysis
To analyze the convergence of the mixed-integer asynchronous computing (ASYN) level decomposition (LD) described above, we rely as much as possible on Reference [31]. However, to account for the mixed-integer nature of the feasible set, we need novel developments like the ones in Theorem 3.1 below. Throughout this section, we assume tol ∆ = 0, as well as the following: Hypothesis 1 (H1). all the workers are responsive; Hypothesis 2 (H2). algorithm generates only finitely many feasibility cuts;

Hypothesis 3 (H3). the workers provide bounded subgradients.
As for H1, the assumption H2 is a mild one: H2 holds, for instance, when f is a polyhedral function, or when X has only finitely many points. The problem of interest satisfies both these properties, and, therefore, H2 is verified. Due to convexity of f , assumption H3 holds, e.g., if X is contained in an open convex set that is itself a subset of Dom( f ) (in this case, no feasibility cut will be generated). H3 also holds in our setting if subgradients are computed via basic optimal dual solutions of the second-stage subproblems. Under H3, we can ensure that the parameter L in the algorithm is finite.
In our analysis, we use the fact that the sequences of the optimality gap, ∆ k , and upper bound, f up k , are non-increasing by definition, and that the sequence of lower bound, f low k , is non-decreasing. More specifically, we update the lower bound only when the MP is infeasible. We count with the number of times the gap significantly decreases, meaning that the test of line 38 is triggered, and denote by k( ) the corresponding iteration. We have the following by construction: As in Reference [31], k( ) denotes a critical iteration, and x k( ) denotes a critical iterate. We introduce the set of iterates between two consecutive critical iterates by K := k( ) + 1, . . . , k( + 1) − 1 . The proof of convergence of the ASYN LD consists in showing that the algorithm performs infinitely many critical iterations when tol ∆ = 0. We start with the following lemma, which is a particular case of Reference [31], Lemma 3, and does not depend on the structure of X. Lemma 1. Fix an arbitrary and let K be defined as above. Then, for all k ∈ K , (a) the MP is feasible, and (b) the stability center is fixed: Item (a) above ensures that the MP is well-defined and f low k is fixed for all k ∈ K . Note that the lower bound is updated only when the MP is found infeasible, and this fact immediately triggers the test at line 38 of the algorithm. Similarly, Algorithm 1 guarantees that the stability center remains fixed for all k ∈ K , since an updated on the stability center would imply a new critical iteration.

Theorem 1.
Assume that X is a compact set and that H1-H3 hold. Let tol ∆ = 0 in the algorithm, and then lim k ∆ k = 0.
Proof of Theorem 1. By (6), we only need to show that the counter increases indefinitely (i.e., that there are infinitely many critical iterations). We obtain this by showing that, for any , the set K is finite; for this, suppose that ∆ k > ∆ > 0 for all k ∈ K . We proceed in two steps, showing the following: (i) The number of asynchronous iterations between two consecutive coordination steps is finite, and (ii) the number of coordination steps in K is finite, as well. If case (i) were not true, then (5) and Lemma 3.
, for all k ∈ K greater than the iteration k of the last coordination iterate. Applying this inequality recursively up to k, we obtain However, this inequality, together with H1 and L < ∞ (due to H3) contradicts the fact that X is bounded. Therefore, item (i) holds. We now turn our attention to the item (ii): Let s, s ∈ K such that s < s be the iteration indices of any two coordination steps. At the moment in which x s is computed, the information ( f j (x s ), g j s ) is available at the MP for all j = 1, . . . , w. As a result of the MP definition, the following constraints are satisfied by x s : By assuming these inequalities and rearranging terms, If there was an infinite number of coordination steps inside K , the compactness of X would allow us to extract a converging subsequence, and this would contradict the above inequality. The number of coordination steps inside K is thus finite. As a conclusion of (i) and (ii), the index-set K is hence finite, and the chain (6) concludes the proof.

Dynamic Asynchronous Level Decomposition
In the asynchronous approach described in Algorithm 1, the component functions f j are statically assigned to workers-worker j always evaluates the same component function j. Likewise, the usual implementation of the synchronous LD strategy is to task workers with solving fixed sets of Q s . We call these strategies static asynchronous LD and static synchronous LD. However, as previously mentioned, such task-allocation policies might result in significant idle times-even for the asynchronous method because we need the first-order information on all f j to compute valid bounds. To lessen the idle times, we implement dynamic-task-allocation strategies, in which component functions are dynamically assigned to workers as soon as they become available. Our dynamic allocation differs from Reference [15] because we do not use a list of iterates. To ease the understanding of the LD methods applied in this work-and to highlight their differences-we introduce a new figure: a coordinator process. The coordinator is responsible for tasking workers with functions to be evaluated. Note, however, that this additional figure is only strictly necessary in the dynamic asynchronous LD; in the other three methods, this responsibility can be taken by the master. Nonetheless, in all methods, the master has three roles: solving the MP, getting iterates, and requesting functions to be evaluated at the newly obtained iterates. By construction, in the synchronous methods, the master requests the coordinator to evaluate all functions f j at the same iterate, and it waits until the information of the all functions has been received to continue the process. On the other hand, in the asynchronous variants, the master computes a new iterate, requests the coordinator to evaluate it on possibly not all f j , and receives information on outdate iterates from the coordinator. Given that the master has requested an iterate x to be evaluated in some f j , the main difference between the static and the dynamic asynchronous methods is that, in the static form, the coordinator always sends x to the same worker who has been previously tasked with solving f j , while in the dynamic one, the coordinator sends x to any available worker.

Modeling Details and Case Studies
The general formulation of our SHTUC is presented in (8)- (19).
I gt ·P g ≤ tg gts ≤ I·P g (12) tg gts − tg gt−1s ≤ I gt−1 ·R g + (1 − I gt−1 )·SU g (13) TL l ≤ f l lts tg, hg, δ + , δ − ≤ TL l , ∀l ∈ L In our model, the indices and respective sets containing them are g ∈ G for thermal generators, h ∈ H for hydro plants, b ∈ B for buses, l ∈ L for transmission lines, and t and o ∈ T for periods. In (5), thermal generators' start-up costs are CS, and we assume that the shutdown cost is null. The thermal-generation costs are C; CL is the per-unit cost of load shedding (δ + ) and generation surplus (δ − ). Expected future-operation cost for scenario s is represented by the piecewise-affine function, f ω s (v s ) : R |H| → R , where v s ∈ R |H| are the reservoir volumes in the last period of scenario s. The first-stage decisions are thermal generators' commitment, start-up, and shutdown, respectively, I, a, and b, and their hydro counterparts (w, z, and u). Set X in (1) contains the feasible commitments of thermal and hydro generators in our SHTUC, and it is defined by Constraints (9)- (11). In this work, we model the statuses of hydro plants with associated binary variables only in the first 48 h, to reduce the computational burden. For the remaining periods, the hydro plants are modeled only with continuous variables. The minimum up-time Constraint (9) ensures that, once turned on, thermal generator g remains on for at least TU g periods. Likewise, the minimum downtime in (9) requires that once g has been turned off, it must remain off for at least TD g periods. Constraints (10) guarantee the satisfaction of logical relations of status, start-up, and shutdown for thermal and hydro plants. The sets Ys are defined by (12)- (19). Constraints (12) are the usual limits on thermal generation tg; (13) and (14) are the up and down ramp-rate limits, and the start-up and shutdown requirements of generators g. Equation (15) is the mass balance of the hydro plant h's reservoir. The A hts is the inflow to reservoir h in period t of scenario s. Moreover, the affine function f v hts (q, s) : R 2·|H|·|T |·|S| → R maps the inflow to h's reservoir in period t of scenario s given the vectors of turbine discharge q and spillage s. The constraints in (16) are the limits on reservoir volume, v, turbine discharge, q, and spillage, s.
In (17), the piecewise-affine function f hg hts (q, s) : R 2·|H|·|T |·|S| → R bounds the hydropower generation hg hts of plant h. We use the classical DC network model: Equation (18) is the bus power balance, where the linear function f p bts (tg, hg, δ + , δ − ) : R |T |·|S|·(|G|+|H|+2·|B|) → R maps the controlled generation at each bus into the power injection at bus b, WG bts is the wind generation at bus b, and L bt is the corresponding load at b. Lastly, (19) are the limits on the flow of transmission line l in period t and scenario s, defined by the affine function f l lst (tg, hg, δ + , δ − ) : R |T |·|S|·(|G|+|H|+2·|B|) → R . We assess our algorithm on a 46-bus system with 11 thermal plants, 16 hydro plants, 3 wind farms, and 95 transmission lines. The system's installed capacity is 18,600 MW, from which 18.9% is due to thermal plants, hydro plants represent 68.1%, and wind farms have a share of 13%. We consider a one-week-long planning horizon with hourly discretization. Thus, a one-scenario instance of our SHTUC would have 7848 binary variables and 5315 constraints at the first stage; and 36,457 continuous variables and 100,949 constraints for each scenario in the second stage. Furthermore, the weekly peak load in the baseline case is 11,204 MW-nearly 60.2% of the installed capacity. The hydro plants are distributed over two basins and include both run-of-river ones and plants with reservoirs capable of regularization. Further information about the system can be found in the multimedia files attached.
The uncertainty comes from wind generation and the inflows. In all tests, we use a scenario set with 256 scenarios. To assess how our algorithm performs in distinct scenario sets, three sets (A, B, and C) are considered. Moreover, we use three initial useful-reservoir-volume levels: 40%, 50%, and 70%. The impact of different load levels on the performance of our algorithms is analyzed through three load levels: low (L), moderate (M), and high (H). Level H is our baseline case regarding load. Levels M and L have the same load profile as H's, but with all loads multiplied by factors of 0.9 and 0.8, respectively. Lastly, to investigate how our algorithm's convergence rate is affected by different choices of initial stability centers, we implement two strategies for obtaining the initial stability center. In both strategies, we solve an expected-value problem, as defined in Reference [5]. In the first one, we use the classical Benders decomposition (BD) with a coarse relative-optimality-gap tolerance of 10% to get a, possibly, low-quality stability center (LQSC). To obtain the stability center of hopefully high quality, which we refer to as high-quality stability center (HQSC), we solve the expected-value problem directly with Gurobi 8.1.1 [33] with a relative-optimality-gap tolerance of 1%. The time limit for obtaining the initial stability centers LQSC and HQSC is set to 5 min. Additionally, the computing setting consists of seven machines of two types: 4 of them have 128 GB of RAM and two Xeon E5-2660 v3 processors with 10 cores clocking at 2.6 GHz; the other 3 machines have 32 GB of RAM and two Xeon X5690 processors with cores cores clocking at 3.47 GHz. All machines are in a LAN with 1-Gbps network interfaces. We test two machine combinations. In the first one, in Combination 1, there are four 20-core machines and one with 12 cores. In Combination 2, we replace one machine with 20 cores by 2 with 12 cores. Regardless of the combination, one 12-core machine is defined as the head node, where only the master is launched. Except for the master-for which Gurobi can take up to 10 cores-for all other processes, i.e., the workers, Gurobi is limited to computing on a single core.
Our computing setting is composed of machines with different configurations. Naturally, solving the same component function in two distinct machines may result in different outputs-and different runtimes. Consequently, the path taken by the MP across iterations might change significantly between experiments on the same data. More specifically to asynchronous methods, the varying order of information arrival to the MP may also yield different convergence rates. Hence, to reduce the effect of these seemingly random behaviors, we conducted 5 experiments for each problem instance. Therefore, our testbed E is defined as E = {40, 50, 70} × {A, B, C} × {L, M, H} × {LQ-SC, HQ-SC} × {Trial 1, . . . , Trial 5} × {Combination 1, Combination 2}-we have 54 problems and 540 experiments. In all instances in E, we divide S into 16 subsets. Thus, following our previous definitions, w = 16 and any subset P j is such that |P j | = 16. Additionally, we set a relative-optimality-gap tolerance of 1% and a time limit of 30 min for all instances in E. Gurobi 8.1.1 is used to solve the MILP MP and the component functions (linear-programming problems) that form the subproblem. The inter-process communication is implemented with mpi4py and Microsoft MPI v10.0.

Results
In this section, the methods are analyzed based on their computing-time performances. We focus on this metric because our results have not shown significant differences among the methods for other metrics, e.g., optimality gap and upper bounds. In addition to analyzing averages of the metric, we use the well-known performance profile [34]. Multimedia files containing the main results for the set E are attached to this work. Figure 1 presents the performance profiles of the methods considering the experiments E. In Figure 1, ρ(τ) and τ are, respectively, the probability that the performance ratio of a given method is within a factor τ of the best ratio, as in Reference [34]. Applying the classical Benders decomposition other metrics, e.g., optimality gap and upper bounds. In addition to analyzing averages of the metric, we use the well-known performance profile [34]. Multimedia files containing the main results for the set  are attached to this work. Figure 1 presents the performance profiles of the methods considering the experiments  . In Figure 1, ρ(τ) and τ are, respectively, the probability that the performance ratio of a given method is within a factor τ of the best ratio, as in Reference  In Figure 1, we see that the dynamic asynchronous LD outperforms all other methods for most instances  . Its performance ratio is within a factor of 2 from the best ratio for about 500 instances (about 92% of the total). Moreover, the static asynchronous LD has a reasonable overall In Figure 1, we see that the dynamic asynchronous LD outperforms all other methods for most instances E. Its performance ratio is within a factor of 2 from the best ratio for about 500 instances (about 92% of the total). Moreover, the static asynchronous LD has a reasonable overall performance-it is within a factor of 2 from the best ratio for more than 400 instances. Moreover, we see that the dynamic-allocation strategy provides significant improvements for both the asynchronous and synchronous LD approaches. The dynamic synchronous LD converges faster than its static counterpart for most of the experiments. Figures 2 and 3 show the performance profiles considering only instances in E with machine Combinations 1 and 2, respectively.
Algorithms 2020, 13, x FOR PEER REVIEW 12 of 16 performance-it is within a factor of 2 from the best ratio for more than 400 instances. Moreover, we see that the dynamic-allocation strategy provides significant improvements for both the asynchronous and synchronous LD approaches. The dynamic synchronous LD converges faster than its static counterpart for most of the experiments. Figures 2 and 3 show the performance profiles considering only instances in  with machine Combinations 1 and 2, respectively.  Figure 2 illustrates that, for a distributed setting in which workers are deployed on machines with identical characteristics, the performances of the methods with dynamic allocation and those with static allocation are similar. Nonetheless, we see that the asynchronous methods still outperform the synchronous LD for most experiments.   Figure 2 illustrates that, for a distributed setting in which workers are deployed on machines with identical characteristics, the performances of the methods with dynamic allocation and those with static allocation are similar. Nonetheless, we see that the asynchronous methods still outperform the synchronous LD for most experiments. In contrast to Figure 2, Figure 3 shows that the dynamic-allocation strategy provides significant time savings for the instances in  with Machine Combination 2. This is due to the great imbalance  Figure 2 illustrates that, for a distributed setting in which workers are deployed on machines with identical characteristics, the performances of the methods with dynamic allocation and those with static allocation are similar. Nonetheless, we see that the asynchronous methods still outperform the synchronous LD for most experiments.
In contrast to Figure 2, Figure 3 shows that the dynamic-allocation strategy provides significant time savings for the instances in E with Machine Combination 2. This is due to the great imbalance between the different machines in Combination 2-machines with processors Xeon E5-2660 v3 are much faster than those with processors Xeon X5690. Table 1 gives the average wall-clock computing times over subsets of E. From this table, we see that the relative average speed-up of the dynamic and static asynchronous LD over the entire set E w.r.t. The static synchronous LD are 54% and 29%, respectively-considering the dynamic synchronous LD, the speed-ups are 45% and 16%, respectively. Moreover, we see that the time savings are more significant for harder-to-solve instances, e.g., instances with high load and/or low-quality initial stability centers. Additionally, Table 1 shows that the dynamic asynchronous LD provides considerable reductions in the standard deviations of the elapsed computing times, in comparison with the other methods. For example, for the problems with high load level (H), the dynamic asynchronous LD has a standard deviation of about 16%, 13%, and 27% smaller than that of the static asynchronous LD, dynamic synchronous LD, and static synchronous LD, respectively.
Based on the data from Table 1, we can compute the speed-up provided by our proposed dynamic ASYN LD w.r.t., and the other three variants are considered here. To better appreciate such speed-ups, we show them in Table 2, where we see that the proposed ASYN LD provides consistent speed-ups over the entire range of operating conditions considered here.
The advantages of the asynchronous methods are made clearer in Figure 4, where we see that not only the asynchronous methods provide (on average) better running times but also present significantly less variation among the problems in E. The latter is relevant in the day-to-day operations of ISOs, since, if there are stochastic hydrothermal unit-commitment (SHTUC) cases that take significantly more time to be solved than the expected, subsequent operation steps that depend on the results of the SHTUC might be affected. Take, for instance, the case from the Midcontinent Independent System Operator reported in Reference [3], where the (deterministic) UC is reported to have solution times varying from just 50 to over 3600 s. Such variation can be problematic in the day-to-day operation of Independent System Operator reported in Reference [3], where the (deterministic) UC is reported to have solution times varying from just 50 to over 3600 s. Such variation can be problematic in the dayto-day operation of power systems since it may disrupt tightly scheduled operations. Naturally, methods that can reduce such variance and still produce high-quality solutions in reasonable times are appealing.

Conclusions
In this work, we present an extension of the asynchronous level decomposition of Reference [31] in a Benders-decomposition framework. We show a convergence analysis of our algorithm, proving that it converges to an optimal solution, if one exists, in finite-many iterations. Our experiments are conducted on an extensive testbed from a real-life-size system. The results show that the proposed asynchronous algorithm outperforms its synchronous counterpart in most of the problems and provides significant time savings. Moreover, we show that the improvements provided by the asynchronous methods over the synchronous ones are even more evident in a distributed-computing setting with machines of different computational powers. Additionally, we show that the asynchronous method is further enhanced by implementing a dynamic-task-allocation strategy.

Conclusions
In this work, we present an extension of the asynchronous level decomposition of Reference [31] in a Benders-decomposition framework. We show a convergence analysis of our algorithm, proving that it converges to an optimal solution, if one exists, in finite-many iterations. Our experiments are conducted on an extensive testbed from a real-life-size system. The results show that the proposed asynchronous algorithm outperforms its synchronous counterpart in most of the problems and provides significant time savings. Moreover, we show that the improvements provided by the asynchronous methods over the synchronous ones are even more evident in a distributed-computing setting with machines of different computational powers. Additionally, we show that the asynchronous method is further enhanced by implementing a dynamic-task-allocation strategy.