1. Introduction
The unit-commitment (UC) problem aims at determining the optimal scheduling of generating units to minimize costs or maximize revenues while satisfying local and system-wide constraints [
1]. In its deterministic form, UC still poses a challenge to operators and researchers due to the large sizes of the systems and the increasing modeling details necessary to represent the system operation. For instance, in the Brazilian case, the current practice is to set a limit of 2 h for the solution of the deterministic UC [
2], while the Midcontinent Independent System Operator (MISO) sets a time limit of 20 min for its UC [
3]. (Note that the Brazilian system and the MISO are different from a physical, as well as from a market-based, viewpoint, but the problems being solved in these two cases share the same classical concept of the UC.) Nonetheless, the growing presence of intermittent generation has added yet more difficulty to the problem, giving rise to what is called uncertain UC [
4]. The latter is considerably harder to solve than its deterministic counterpart, and one of the reasons for its lack of adoption in the industry is precisely its computational burden: Large-scale uncertain UC takes a prohibitively long time to be solved. In this context, efficient solution methods for the uncertain UC that can take full advantage of the computational resources at hand are both desirable and necessary to help system operators cope with uncertain resources.
In particular, to model the uncertainty arising from renewable sources, one of two approaches is generally employed: robust optimization or stochastic programming [
4]. The latter is by far the most employed, both in its chance-constrained and recourse variants. In stochastic programs with recourse, uncertainty is, in general, represented by finite-many scenarios, and the problem is formulated either in a two-stage or multistage setting. In two-stage stochastic problems, the first-stage variables must be decided before uncertainty is revealed. Once the uncertain information becomes known, recourse actions are taken to best accommodate the first-stage decisions [
5]. In stochastic hydrothermal unit-commitment (SHTUC) problems, the sources of uncertainties are related to renewable resources, spot prices, load, and equipment availability [
1,
4].
The commitment decisions are usually modeled as first-stage variables, while dispatch decisions are the recourse actions (second-stage variables). Given the mixed-integer nature of commitment decisions, SHTUC problems in a two-stage formulation give rise to large-scale mixed-integer optimization models whose numerical solution by off-the-shelf solvers is often prohibitive due to time requirements or limited computing resources. Consequently, decomposition techniques must come into play [
1,
4,
6,
7]. Benders decomposition (BD) and Lagrangian relaxation (LR) are the most used techniques to handle SHTUC problems. While the BD deals with the primal problem [
8], LR is a dual procedure employed to compute the best lower bound for the SHTUC problem [
7,
9]. Primal-recovery heuristics are employed to compute primal-feasible points, which are not, in general, optimal solutions. This is the main shortcoming of LR-based techniques.
Decomposition techniques yield models that are amenable for parallelization [
5]. A common strategy for solving problems simultaneously is to use a master/worker framework with pre-specified synchronization points [
10], which we call synchronous computing (SYN). In this framework, the master chooses new iterates and sends them to workers, who, in turn, are responsible for solving one or more subproblems. Examples of SYN implementations for UC are given in [
11,
12,
13,
14]. An aspect of SYN is that, at predetermined points of the algorithm, the master must wait for all workers to respond to resume the iterative process: the synchronization points. However, the times for workers to finish their respective tasks might vary significantly. This results in idle times, both for the master and for workers who respond quickly [
10]. One way to reduce idle times is to use asynchronous computing (ASYN).
In contrast to SYN, in ASYN, there are no synchronization points, so the master and workers do not need to wait until all workers respond to continue their operations. Thus, in an iterative process, e.g., in BD, the master would compute the next iterate based on information of possibly only a proper, but nonempty, subset of the workers. Based on this possibly incomplete information, the master sends a new iterate to available workers, while slower workers are still carrying their tasks on an outdated iterate. Because in ASYN iterates might not be evaluated by all workers, the evaluation of the objective function (yielding bounds on the optimal values) is precluded. Hence, a fundamental step in ASYN is the (scarce) coordination of workers to produce valid bounds.
ASYN implementations have been proposed in the UC literature mainly to solve the dual problems (issued by LRs) via either subgradient algorithms or cutting-plane-based methods [
15,
16,
17]. In References [
15,
16], a queue of iterates is created and its elements are gradually sent to the workers. Auxiliary lists keep track of the evaluation status of each worker with respect tothe elements in the queue. Once an element has been evaluated by all workers, a valid bound to the original problem is available. The authors of Reference [
15] demonstrate that their algorithm converges to a dual global solution regardless of the iterate-selection policy used to choose the iterates from the queue—first-in-first-out or last-in-first-out. In References [
17], the authors keep a list of all the iterates to compute valid bounds. In addition to solving the dual problem asynchronously, Reference [
17] also conducts the primal recovery asynchronously. While References [
15,
16] employ a convex trust-region bundle method, Reference [
17] implements an incremental subgradient method. Asynchronous implementations of BD for convex problems can be found in References [
18,
19,
20]. In Reference [
18], the dual dynamic-programming algorithm is handled asynchronously in a hydrothermal scheduling problem. In Reference [
19], the stochastic dual dynamic-programming algorithm is used for addressing the long-term planning problem of a hydro-dominated system: The authors propose to compute Benders cuts in an asynchronous fashion. This is also the case in Reference [
20], where the authors consider an asynchronous Benders decomposition for convex multistage stochastic programming.
Despite being successfully applied in a variety of fields, e.g., References [
18,
19] and the references in References [
21], the classical BD is well-known to suffer from slow convergence due to the oscillatory nature of Kelley’s cutting-plane method [
22,
23]. Regularized BDs have been proven to outperform the classical one in several problems: See Reference [
24] for (convex) two-stage linear programming, Reference [
25] for (nonconvex) chance-constrained problems, and Reference [
26] for robust designed of stations in water distribution networks. Several types of regularization exist [
25,
27,
28]: proximal, trust-region, and level sets. Among the regularization methods, the level bundle method [
29], also known as level decomposition (LD) in two-stage programming [
24], stands out for its flexibility in dealing with convex or nonconvex feasible sets, stability functions and centers, and inexact oracles [
25,
26,
30]. Recently, asymptotically level bundle methods for convex optimization were proposed in Reference [
31]. The paper presents two algorithms. The first one does not employ coordination, but it makes use of upper bounds on the Lipschitz constants of the involved functions to compute upper bounds for the problem. The second algorithm does not make use of the latter assumption but requires scarce coordination. The authors of Reference [
31] focus on the convergence analysis of their proposals (suitable only for the convex setting) and present limited numerical experiments. In this work, we build on Reference [
31] and extend its asynchronous algorithm with scarce coordination (Algorithm 3 of Reference [
31]) to the mixed-integer setting. Moreover, we consider a more general setting in which tasks can be assigned to works in a dynamic fashion, as described in
Section 3. We highlight that the convergence analysis given in Reference [
31] relies strongly on elements of convex analysis such as the Smulian’s theorem and the Painlevé–Kuratowski set convergence. Such key theoretical results are no longer valid in the setting of nonconvex sets, and hence the convergence analysis developed in Reference [
31] does not apply to our mixed-integer setting. For this reason, the convergence analysis of our asynchronous LD must be done anew. We not only provide convergence analysis of our method but also assess its numerical performance on a test set consisting of 54 instances of two-stage UC problems with mixed-integer variables in the first stage.
We care to mention that other asynchronous bundle methods exist in the literature, but they are all designed for convex optimization problems [
15,
16,
32]. The latter reference proposes an asynchronous proximal bundle method, whereas References [
15,
16] consider a trust-region variant for polyhedral functions. Our approach, which follows the lines of the extended level bundle method of Reference [
30], does not require the involved functions to be polyhedral or the feasible set to be convex. As an additional advantage, our algorithm is easily implementable.
This work is organized as follows.
Section 2 presents a generic formulation of our two-stage SHTUC problem. The extended asynchronous LD and its convergence analysis are presented in
Section 2.1 and
Section 2.2, respectively.
Section 3 presents more details of the considered SHTUC problem and states our case studies. Numerical experiments assessing the benefits of our proposal are given in
Section 4. Finally, in
Section 5, we present our final remarks.
2. Materials and Methods
We address the problem of an Independent System Operator (ISO) in a hydro-dominated system with a loose-pool market framework. The ISO decides the day-ahead commitment considering operation costs, forecast errors in wind generation, and inflows; and the usual generation and system-wide constraints. The uncertainties in wind and inflows are represented by a finite set of scenarios,
, and the decisions are made in two stages. At the first stage, the ISO decides on the commitment of units, whereas, at the second stage, the operator determines the dispatch according to the random-variable realization. Full details on the considered stochastic hydrothermal unit-commitment (SHTUC) are given shortly. For presenting our approach, which is not limited to (stochastic) unit-commitment (UC) problems, we adopt the following generic formulation.
In this formulation, the n-dimensional vector x represents the first-stage variables with associated cost-vector, c. The second-stage variables, ys, and their associated costs, qs, depend on the scenario, . The cost vector, qs, is assumed to incorporate the positive probability of scenario s. The first- and second-stage variables are coupled by constraints : T is the technology matrix; and W and hs are, respectively, the recourse matrix and a vector of appropriate dimensions. While is a compact possibly nonconvex, the scenario-dependent set is a convex polyhedron.
As previously mentioned, depending on the UC problem and number of scenarios, the mixed-integer linear programming (MILP) Problem (1) cannot be solved directly by an off-the-shelf solver. The problem is thus decomposed by making use of the recourse functions.
It is well-known that
is a non-smooth convex function of
x. If the above subproblem has a solution, then a subgradient of
at
x can be computed by making use of a Lagrange multiplier,
πs, associated with a constraint,
:
. On the other hand, if the recourse function
is infeasible, then the point
x can be cutoff by adding a feasibility cut [
5].
Let
be a partition of
into w subsets:
, with
for all j ∈ {1,…,w}, and
for
. By defining
Problem (1) can be rewritten as
In our notation, w stands for the number of workers evaluating the recourse functions. The workers j ∈ {1,…,w} are processes running on a single machine or multiple machines. Likewise, we define a master process—hereafter referred to only as master—to solve the master program (which is defined shortly).
2.1. The Mixed-Integer and Asynchronous Level Decomposition
For every point
, where k represents an iteration counter, worker j receives
and provides us with the first-order information on the component function
: the value of the function
and a subgradient [
23]
, in the two-stage setting,
. Convexity of
implies that the linearization
approximates
from below for all
x. By gathering iteration indices into sets
along with the iterations at which
were evaluated, we can construct individual cutting-plane models for functions
, with j ∈ {1,…,w}:
These models define—together with a stability center
, a level parameter
, and a given norm
—the following master program (MP)
At iteration k, an MP solution is denoted by
. If any
is infeasible at
, then a feasibility cut is added to the MP. We skip further details on this matter, since it is a well-known subject in the literature of two-stage programming [
5]. On the other hand, if
(sent to a work j) is feasible for all recourse functions,
, the model
in the MP is updated. The improvement in the model
is possibly based on outdated iterate
, where a(j) < k is the iteration index of the
anterior information provided by worker j. We care to mention that the MP can be infeasible itself depending on the level parameter
. Due to the convexity of the involved functions, if the MP is infeasible, then
is a valid lower bound,
, on
f* [
30].
Without coordination, there is no reason for all workers to be called upon the same iterate. This fact precludes the computation of an upper bound,
, of
f*. Algorithm 2 in Reference [
31] deals with this situation without resorting to coordination techniques, but it requires more assumptions on the functions
: upper bounds on their Lipschitz constants should be known. Since we do not make this assumption, we will need scarce coordination akin to Algorithm 3 of Reference [
31] for computing upper bounds on
f*. As in Reference [
31], the coordination iterates are denoted by
. Assuming that all workers eventually respond (after an unknown time), the coordination allows them to compute the full value,
, and a subgradient,
, at the coordination iterate. The function value is used to update the upper bound,
, as usual for level methods; the subgradient is used to update the bound L on the Lipschitz constant of
f.
In our algorithm below, the coordination is implemented by two vectors of Booleans:
to-coordinate and
coordinating. The role of
to-coordinate[j] is to indicate to the master that worker j will evaluate
on the new coordination point
; (at that moment,
to-coordinate[j] is set to
false, and
coordinating[j] is set to
true). Similarly,
coordinating[j] indicates to the master that worker j is responding to a coordination step, which is used to update the upper bound. When a worker j responds, it is included in the set
of available workers. If all workers are busy, then
. Our algorithm mirrors as much as possible Algorithm 3 of Reference [
31], but contains some important specificities to handle (i) mixed-integer feasible sets and (ii) extended real-valued objective functions (we do not assume that
is finite for all
). To handle (ii), we furnish our algorithm with a feasibility check (and addition of cuts), and for (i) we not only use a specialized solver for the MP but also change the rule for scarce coordination. The reason is that the rule of Reference [
31] is only valid in the convex setting. Under nonconvexity, the coordination test
(with
and
,
) implies that the following inequality (important for the convergence analysis) is jeopardized:
In the algorithm below, coordination is triggered when (5) is not satisfied and all workers have already responded on the last coordination iterate (i.e.,
, where
stands for “remaining to respond”).
Algorithm 1: Asynchronous Level Decomposition. |
1. Choose a gap tolerance tol∆, upper bound > f* + tol∆, lower bound < f*, , L > 0, and a feasible point. Set , ← , ← ∞, rr ← 0, , k ← 0, for . |
2. for k ← 1 to k + 1 do |
3. if (5) does not hold and rr = 0 then |
4. ← , rr ← w, ← and ← c |
5. for all j do |
6. to_coordinate[j] ← false and |
7. coordinating[j] ← true |
8. end for |
9. for all j do |
10. to_coordinate[j] ← true and |
11. coordinating[j] ← false |
12. end for |
13. end if |
14. Send to all available workers j and set |
15. Update the set of idle workers and receive from workers j |
16. Update ← for all j and set |
17. for all j ∈ do |
18. if coordinating[j] = truethen |
19. coordinating[j] ← false and rr ← rr − 1 |
20. ← + and ← + |
21. if rr = 0 then |
22. Set |
23. if < then |
24. ← and ← |
25. end if |
26. end if |
27. else |
28. if to_coordinate[j] = truethen |
29. Send to worker j and set |
30. Set to_coordinate[j] ← false and |
31. coordinating[j] ← true |
32. end if |
33. end if |
34. end for |
35. Set |
36. Set ← |
37. if ≤ tol∆ then stop: return and end if |
38. if ≤ then Set ← and ← end if |
39. ← |
40. if (4) is feasible then |
41. Get a new iterate from the solution of (4) |
42. else |
43. Set ← and go to Step 36 |
44. end if |
45. if leads to infeasible subproblems then |
46. Add a feasibility cut to the MP (2) and go to Step 40 |
47. end if |
48. Set |
49. end for |
The assumption that the algorithm starts with a feasible point is made only for the sake of simplicity. Indeed, the initial point can be infeasible, but, in this case, Step 3 must be changed to ensure that the first computed feasible point is a coordination iterate. For the problem of interest, the feasibility check performed at line 45 amounts to verifying if . In our SHTUC, the feasibility check comprises an auxiliary problem for verifying if ramp-rate constraints would be violated by and an additional auxiliary problem for checking if reservoir-volume bounds would be violated. Both problems are easily reduced to small linear-programming problems that can be solved to optimality in split seconds by off-the-shelf solvers.
2.2. Convergence Analysis
To analyze the convergence of the mixed-integer asynchronous computing (ASYN) level decomposition (LD) described above, we rely as much as possible on Reference [
31]. However, to account for the mixed-integer nature of the feasible set, we need novel developments like the ones in Theorem 3.1 below. Throughout this section, we assume tol
∆ = 0, as well as the following:
Hypothesis 1 (H1). all the workers are responsive;
Hypothesis 2 (H2). algorithm generates only finitely many feasibility cuts;
Hypothesis 3 (H3). the workers provide bounded subgradients.
As for H1, the assumption H2 is a mild one: H2 holds, for instance, when is a polyhedral function, or when has only finitely many points. The problem of interest satisfies both these properties, and, therefore, H2 is verified. Due to convexity of , assumption H3 holds, e.g., if is contained in an open convex set that is itself a subset of (in this case, no feasibility cut will be generated). H3 also holds in our setting if subgradients are computed via basic optimal dual solutions of the second-stage subproblems. Under H3, we can ensure that the parameter L in the algorithm is finite.
In our analysis, we use the fact that the sequences of the optimality gap,
, and upper bound,
, are non-increasing by definition, and that the sequence of lower bound,
, is non-decreasing. More specifically, we update the lower bound only when the MP is infeasible. We count with
the number of times the gap significantly decreases, meaning that the test of line 38 is triggered, and denote by
the corresponding iteration. We have the following by construction:
As in Reference [
31],
denotes a critical iteration, and
denotes a critical iterate. We introduce the set of iterates between two consecutive critical iterates by
The proof of convergence of the ASYN LD consists in showing that the algorithm performs infinitely many critical iterations when tol
∆ = 0. We start with the following lemma, which is a particular case of Reference [
31], Lemma 3, and does not depend on the structure of
.
Lemma 1. Fix an arbitraryand letbe defined as above. Then, for all, (a) the MP is feasible, and (b) the stability center is fixed: .
Item (a) above ensures that the MP is well-defined and is fixed for all . Note that the lower bound is updated only when the MP is found infeasible, and this fact immediately triggers the test at line 38 of the algorithm. Similarly, Algorithm 1 guarantees that the stability center remains fixed for all , since an updated on the stability center would imply a new critical iteration.
Theorem 1. Assume thatis a compact set and that H1–H3 hold. Let tol∆= 0 in the algorithm, and then .
Proof of Theorem 1. By (6), we only need to show that the counter
increases indefinitely (i.e., that there are infinitely many critical iterations). We obtain this by showing that, for any
, the set
is finite; for this, suppose that
for all
. We proceed in two steps, showing the following: (i) The number of asynchronous iterations between two consecutive coordination steps is finite, and (ii) the number of coordination steps in
is finite, as well. If case (i) were not true, then (5) and Lemma 3.1(b) would give
for all
greater than the iteration
of the last coordination iterate. Applying this inequality recursively up to
, we obtain
However, this inequality, together with H1 and
(due to H3) contradicts the fact that
is bounded. Therefore, item (i) holds. We now turn our attention to the item (ii): Let
such that
be the iteration indices of any two coordination steps. At the moment in which
is computed, the information
is available at the MP for all
. As a result of the MP definition, the following constraints are satisfied by
:
By assuming these inequalities and rearranging terms, we get where the constant is ensured by H3. The definition of and inequality gives . If there was an infinite number of coordination steps inside , the compactness of would allow us to extract a converging subsequence, and this would contradict the above inequality. The number of coordination steps inside is thus finite. As a conclusion of (i) and (ii), the index-set is hence finite, and the chain (6) concludes the proof. □
2.3. Dynamic Asynchronous Level Decomposition
In the asynchronous approach described in Algorithm 1, the component functions
are statically assigned to workers—worker j always evaluates the same component function j. Likewise, the usual implementation of the synchronous LD strategy is to task workers with solving fixed sets of
. We call these strategies static asynchronous LD and static synchronous LD. However, as previously mentioned, such task-allocation policies might result in significant idle times—even for the asynchronous method because we need the first-order information on all
to compute valid bounds. To lessen the idle times, we implement dynamic-task-allocation strategies, in which component functions are dynamically assigned to workers as soon as they become available. Our dynamic allocation differs from Reference [
15] because we do not use a list of iterates. To ease the understanding of the LD methods applied in this work—and to highlight their differences—we introduce a new figure: a coordinator process. The coordinator is responsible for tasking workers with functions to be evaluated. Note, however, that this additional figure is only strictly necessary in the dynamic asynchronous LD; in the other three methods, this responsibility can be taken by the master. Nonetheless, in all methods, the master has three roles: solving the MP, getting iterates, and requesting functions to be evaluated at the newly obtained iterates. By construction, in the synchronous methods, the master requests the coordinator to evaluate all functions
at the same iterate, and it waits until the information of the all functions has been received to continue the process. On the other hand, in the asynchronous variants, the master computes a new iterate, requests the coordinator to evaluate it on possibly not all
, and receives information on outdate iterates from the coordinator. Given that the master has requested an iterate
x′ to be evaluated in some
, the main difference between the static and the dynamic asynchronous methods is that, in the static form, the coordinator always sends
x′ to the same worker who has been previously tasked with solving
, while in the dynamic one, the coordinator sends
x′ to any available worker.
3. Modeling Details and Case Studies
The general formulation of our SHTUC is presented in (8)–(19).
In our model, the indices and respective sets containing them are g ∈ for thermal generators, h ∈ for hydro plants, b ∈ for buses, l ∈ for transmission lines, and t and o ∈ for periods. In (5), thermal generators’ start-up costs are CS, and we assume that the shutdown cost is null. The thermal-generation costs are C; CL is the per-unit cost of load shedding () and generation surplus (). Expected future-operation cost for scenario s is represented by the piecewise-affine function, , where are the reservoir volumes in the last period of scenario s. The first-stage decisions are thermal generators’ commitment, start-up, and shutdown, respectively, I, a, and b, and their hydro counterparts (w, z, and u). Set in (1) contains the feasible commitments of thermal and hydro generators in our SHTUC, and it is defined by Constraints (9)–(11). In this work, we model the statuses of hydro plants with associated binary variables only in the first 48 h, to reduce the computational burden. For the remaining periods, the hydro plants are modeled only with continuous variables. The minimum up-time Constraint (9) ensures that, once turned on, thermal generator g remains on for at least TUg periods. Likewise, the minimum downtime in (9) requires that once g has been turned off, it must remain off for at least TDg periods. Constraints (10) guarantee the satisfaction of logical relations of status, start-up, and shutdown for thermal and hydro plants. The sets are defined by (12)–(19). Constraints (12) are the usual limits on thermal generation tg; (13) and (14) are the up and down ramp-rate limits, and the start-up and shutdown requirements of generators g. Equation (15) is the mass balance of the hydro plant h’s reservoir. The Ahts is the inflow to reservoir h in period t of scenario s. Moreover, the affine function maps the inflow to h’s reservoir in period t of scenario s given the vectors of turbine discharge q and spillage s. The constraints in (16) are the limits on reservoir volume, v, turbine discharge, q, and spillage, s. In (17), the piecewise-affine function bounds the hydropower generation hghts of plant h. We use the classical DC network model: Equation (18) is the bus power balance, where the linear function maps the controlled generation at each bus into the power injection at bus b, WGbts is the wind generation at bus b, and Lbt is the corresponding load at b. Lastly, (19) are the limits on the flow of transmission line l in period t and scenario s, defined by the affine function .
We assess our algorithm on a 46-bus system with 11 thermal plants, 16 hydro plants, 3 wind farms, and 95 transmission lines. The system’s installed capacity is 18,600 MW, from which 18.9% is due to thermal plants, hydro plants represent 68.1%, and wind farms have a share of 13%. We consider a one-week-long planning horizon with hourly discretization. Thus, a one-scenario instance of our SHTUC would have 7848 binary variables and 5315 constraints at the first stage; and 36,457 continuous variables and 100,949 constraints for each scenario in the second stage. Furthermore, the weekly peak load in the baseline case is 11,204 MW—nearly 60.2% of the installed capacity. The hydro plants are distributed over two basins and include both run-of-river ones and plants with reservoirs capable of regularization. Further information about the system can be found in the multimedia files attached.
The uncertainty comes from wind generation and the inflows. In all tests, we use a scenario set with 256 scenarios. To assess how our algorithm performs in distinct scenario sets, three sets (A, B, and C) are considered. Moreover, we use three initial useful-reservoir-volume levels: 40%, 50%, and 70%. The impact of different load levels on the performance of our algorithms is analyzed through three load levels: low (L), moderate (M), and high (H). Level H is our baseline case regarding load. Levels M and L have the same load profile as H’s, but with all loads multiplied by factors of 0.9 and 0.8, respectively. Lastly, to investigate how our algorithm’s convergence rate is affected by different choices of initial stability centers, we implement two strategies for obtaining the initial stability center. In both strategies, we solve an expected-value problem, as defined in Reference [
5]. In the first one, we use the classical Benders decomposition (BD) with a coarse relative-optimality-gap tolerance of 10% to get a, possibly, low-quality stability center (LQSC). To obtain the stability center of hopefully high quality, which we refer to as high-quality stability center (HQSC), we solve the expected-value problem directly with Gurobi 8.1.1 [
33] with a relative-optimality-gap tolerance of 1%. The time limit for obtaining the initial stability centers LQSC and HQSC is set to 5 min. Additionally, the computing setting consists of seven machines of two types: 4 of them have 128 GB of RAM and two Xeon E5-2660 v3 processors with 10 cores clocking at 2.6 GHz; the other 3 machines have 32 GB of RAM and two Xeon X5690 processors with cores cores clocking at 3.47 GHz. All machines are in a LAN with 1-Gbps network interfaces. We test two machine combinations. In the first one, in Combination 1, there are four 20-core machines and one with 12 cores. In Combination 2, we replace one machine with 20 cores by 2 with 12 cores. Regardless of the combination, one 12-core machine is defined as the head node, where only the master is launched. Except for the master—for which Gurobi can take up to 10 cores—for all other processes, i.e., the workers, Gurobi is limited to computing on a single core.
Our computing setting is composed of machines with different configurations. Naturally, solving the same component function in two distinct machines may result in different outputs—and different runtimes. Consequently, the path taken by the MP across iterations might change significantly between experiments on the same data. More specifically to asynchronous methods, the varying order of information arrival to the MP may also yield different convergence rates. Hence, to reduce the effect of these seemingly random behaviors, we conducted 5 experiments for each problem instance. Therefore, our testbed is defined as {40, 50, 70} {A, B, C} {L, M, H} {LQ-SC, HQ-SC} {Trial 1, …, Trial 5} {Combination 1, Combination 2}—we have 54 problems and 540 experiments. In all instances in , we divide into 16 subsets. Thus, following our previous definitions, w = 16 and any subset is such that || = 16. Additionally, we set a relative-optimality-gap tolerance of 1% and a time limit of 30 min for all instances in . Gurobi 8.1.1 is used to solve the MILP MP and the component functions (linear-programming problems) that form the subproblem. The inter-process communication is implemented with mpi4py and Microsoft MPI v10.0.
4. Results
In this section, the methods are analyzed based on their computing-time performances. We focus on this metric because our results have not shown significant differences among the methods for other metrics, e.g., optimality gap and upper bounds. In addition to analyzing averages of the metric, we use the well-known performance profile [
34]. Multimedia files containing the main results for the set
are attached to this work.
Figure 1 presents the performance profiles of the methods considering the experiments
. In
Figure 1, ρ(τ) and τ are, respectively, the probability that the performance ratio of a given method is within a factor τ of the best ratio, as in Reference [
34]. Applying the classical Benders decomposition (BD) on the set {40, 50, 70}
{A}
{L, M, H}
{Combination 1} results in the convergence only of the problem in {70}
{A}
{M}
{Combination 1}, for which BD converges to a 1%-optimal solution in 1281.42 s. Thus, it is reasonable to expect that the classical BD would also perform poorly for the remaining experiments
.
In
Figure 1, we see that the dynamic asynchronous LD outperforms all other methods for most instances
. Its performance ratio is within a factor of 2 from the best ratio for about 500 instances (about 92% of the total). Moreover, the static asynchronous LD has a reasonable overall performance—it is within a factor of 2 from the best ratio for more than 400 instances. Moreover, we see that the dynamic-allocation strategy provides significant improvements for both the asynchronous and synchronous LD approaches. The dynamic synchronous LD converges faster than its static counterpart for most of the experiments.
Figure 2 and
Figure 3 show the performance profiles considering only instances in
with machine Combinations 1 and 2, respectively.
Figure 2 illustrates that, for a distributed setting in which workers are deployed on machines with identical characteristics, the performances of the methods with dynamic allocation and those with static allocation are similar. Nonetheless, we see that the asynchronous methods still outperform the synchronous LD for most experiments.
In contrast to
Figure 2,
Figure 3 shows that the dynamic-allocation strategy provides significant time savings for the instances in
with Machine Combination 2. This is due to the great imbalance between the different machines in Combination 2—machines with processors Xeon E5-2660 v3 are much faster than those with processors Xeon X5690.
Table 1 gives the average wall-clock computing times over subsets of
. From this table, we see that the relative average speed-up of the dynamic and static asynchronous LD over the entire set
w.r.t. the static synchronous LD are 54% and 29%, respectively—considering the dynamic synchronous LD, the speed-ups are 45% and 16%, respectively. Moreover, we see that the time savings are more significant for harder-to-solve instances, e.g., instances with high load and/or low-quality initial stability centers. Additionally,
Table 1 shows that the dynamic asynchronous LD provides considerable reductions in the standard deviations of the elapsed computing times, in comparison with the other methods. For example, for the problems with high load level (H), the dynamic asynchronous LD has a standard deviation of about 16%, 13%, and 27% smaller than that of the static asynchronous LD, dynamic synchronous LD, and static synchronous LD, respectively.
Based on the data from
Table 1, we can compute the speed-up provided by our proposed dynamic ASYN LD w.r.t., and the other three variants are considered here. To better appreciate such speed-ups, we show them in
Table 2, where we see that the proposed ASYN LD provides consistent speed-ups over the entire range of operating conditions considered here.
The advantages of the asynchronous methods are made clearer in
Figure 4, where we see that not only the asynchronous methods provide (on average) better running times but also present significantly less variation among the problems in
. The latter is relevant in the day-to-day operations of ISOs, since, if there are stochastic hydrothermal unit-commitment (SHTUC) cases that take significantly more time to be solved than the expected, subsequent operation steps that depend on the results of the SHTUC might be affected. Take, for instance, the case from the Midcontinent Independent System Operator reported in Reference [
3], where the (deterministic) UC is reported to have solution times varying from just 50 to over 3600 s. Such variation can be problematic in the day-to-day operation of power systems since it may disrupt tightly scheduled operations. Naturally, methods that can reduce such variance and still produce high-quality solutions in reasonable times are appealing.