A Review on the Performance of Linear and Mixed Integer Two-Stage Stochastic Programming Software

: This paper presents a tutorial on the state-of-the-art software for the solution of two-stage (mixed-integer) linear stochastic programs and provides a list of software designed for this purpose. The methodologies are classiﬁed according to the decomposition alternatives and the types of the variables in the problem. We review the fundamentals of Benders decomposition, dual decomposition and progressive hedging, as well as possible improvements and variants. We also present extensive numerical results to underline the properties and performance of each algorithm using software implementations, including DECIS, FORTSP, PySP, and DSP. Finally, we discuss the strengths and weaknesses of each methodology and propose future research directions.


Introduction
In the modeling and optimization of real-world problems, there is usually a level of uncertainty associated with the input parameters and their future outcomes. Stochastic programming (SP) models have been widely studied to solve optimization problems under uncertainty over the past decades [1,2]. SP is acknowledged for providing superior results compared to the corresponding deterministic model with nominal values for the uncertain parameters, which can lead to suboptimal or infeasible solutions. SP applications in process systems engineering include manufacturing networks and supply chain optimization [3,4], production scheduling [5], synthesis of process networks [6].
Two-stage stochastic programming is a commonly applied framework for cases where parameter uncertainties are decision independent (exogenous uncertainties). In stochastic programming, uncertain parameters are explicitly represented by a set of scenarios. Each scenario corresponds to one possible realization of the uncertain parameters, according to a discretized probability distribution. The goal is to optimize the expected value of the objective function over the full set of scenarios, subject to the implementation of common decisions at the beginning of the planning horizon.
Stochastic programs are often difficult to solve due to their large size and complexity that grows with the number of scenarios. To overcome these difficulties, decomposition algorithms, such as Benders decomposition [7], Lagrangian decomposition [8], and progressive hedging [9], have been developed to solve linear programming (LP) and mixed-integer linear programming (MILP) stochastic problems. For a comprehensive review of recent algorithmic advances in two-stage stochastic MIP, we refer the readers to the tutorial [10].
On the software development front, several modeling systems and optimization platforms have included extensions for a convenient algebraic representation of stochastic problems, as offered by major software vendors, such as GAMS, LINDO, XpressMP, AIMMS, and Maximal.
In recent years, various commercial and open-source applications have been developed specifically to represent and solve two-stage and multistage SP problems. Some of them include capabilities to read and build stochastic MPS (SMPS) files, the standard exchange format for SP applications. However, despite advances in the field and proven benefits, SP has not been widely used in industrial applications. Existing applications include power systems planning and scheduling [11][12][13], and process systems engineering [14,15].
The scope of this paper is to provide a tutorial for beginners on linear and mixedinteger linear stochastic programming, introducing the fundamentals on solution methodologies with a focus on available software. To accomplish this objective, we present a general description of techniques based on Benders decomposition and dual decomposition, which are the fundamentals of most known solvers designed specifically to tackle special problem structures. We then review the currently available state-of-the-art software for the solution of two-stage stochastic programming problems and evaluate their performance, using large-scale test libraries in SMPS format.
The remainder of this paper is organized as follows. In Section 2, we explain the mathematical formulation of (mixed-integer) linear stochastic problems. Section 3 describes the classical L-shaped algorithm. Section 4 summarizes the enhancement strategies to improve the performance of Benders decomposition. Section 5 describes scenario decomposition methods and algorithmic enhancements. In Section 6, we describe algorithmic innovations in software packages for dual decomposition and show some computational results. Finally, in Section 7, we summarize the conclusions.

Problem Statement
We consider a two-stage stochastic mixed-integer linear problem (P) in the following form: (P) min x,y x ∈ X, X = x : x i ∈ {0, 1} ∀i ∈ I 1 , x i ≥ 0 ∀i ∈ I\I 1 (1c) where x denotes the 'here and now' decisions, taken at the beginning of the planning horizon before the uncertainties unfold, and Ω is the set of scenarios. Vector y ω represents the recourse or corrective actions (wait and see), applied after the realization of the uncertainty. Matrix A ∈ R m 1 ×n 1 and vector b ∈ R m 1 represent the first-stage constraints. Matrices T ω and W ω , and vector h ω ∈ R m 2 represent the second-stage problem. Matrices T ω ∈ R m 2 ×n 1 and W ω ∈ R m 2 ×n 2 are called technological and recourse matrices, respectively. Let I = {1, 2, . . . , n 1 } be the index set of all first-stage variables. Set I 1 ⊆ I as the subset of indices for binary first-stage variables. Let J = {1, 2, . . . , n 2 } be the index set of all second-stage variables. If the second-stage variables are mixed integer, Y = y : y j ∈ {0, 1}, ∀j ∈ J 1 , y j ≥ 0 ∀j ∈ J\J 1 , where set J 1 ⊆ J is the subset of indices for binary second-stage variables. If all the second-stage variables are continuous, set J 1 = ∅ and Y = y : y j ≥ 0 ∀j ∈ J . The objective function (TC) minimizes the total expected cost with the scenario probability τ ω , and the cost vectors c and d ω . It should be noted that the approaches discussed in this paper can be applied to mixed-integer variables.
Here, we restrict the variables to be mixed binary for simplicity. Equation (1) is often referred to as the deterministic equivalent, or extensive form of the SP. Formulation (P) can be rewritten in an equivalent form (PNAC) with nonanticipativity constraints (NACs), where the first-stage variables are no longer shared, and each scenario represents an instance of a deterministic problem with a specific realization outcome [16,17].
In Equation (2c) , G ω represents the feasible region for scenario ω, which is defined by constraints (1b)-(1e). Nonanticipativity constraints (2b) are added to ensure that the first-stage decisions are the same across all scenarios. Nonanticipativity constraints are represented by a suitable sequence of matrices H ω ∈ R n 1 ·(|Ω|−1)×n 1 . One example of such constraints is the following: Given the mathematical structure of the deterministic equivalent formulations (P) and (PNAC), (mixed-integer) linear stochastic problems can be solved using decomposition methods derived from duality theory [18]. Such methods split the deterministic equivalent into a master problem and a series of smaller subproblems to decentralize the overall computational burden. Decomposition methodologies are classified in two groups: (i) nodebased or vertical decomposition, which includes Benders decomposition and variants where the problem is decomposed by the nodes in the scenario tree, and (ii) scenario-based or horizontal decomposition, where the problem is decomposed by scenarios. In the following Section 3, we provide a tutorial overview of Benders decomposition. In Section 4, we also provide a tutorial review of scenario decomposition methods, including the dual decomposition algorithm and the progressive hedging algorithm.

L-Shaped Algorithm/Benders Decomposition
If the second-stage variables are all continuous (i.e., Y = y : y j ≥ 0 ∀j ∈ J ), problem (P) can be solved with Benders decomposition. Benders decomposition (BD) was originally developed in 1962 by Benders [19] to solve large-scale mixed-integer linear problems (MILP) with complicating variables. This concept has been extended to solve a broader range of optimization problems [20], including multistage, bilevel, and nonlinear programming. When applied to stochastic problems, it is commonly referred to as the L-shaped algorithm [7].
The L-shaped algorithm partitions the deterministic formulation (P) into multiple problems: (i) a master problem (MP) that contains all the first-stage constraints and variables, which can be mixed integer; and (ii) a collection of subproblems that include corrective future actions for the given first-stage solution. The master problem (MP) is derived from the projection of (P) on the variables x: where Q(x) = ∑ ω∈Ω τ ω θ ω (x) is defined as the recourse function (or expected second-stage value function); and θ ω (x) is defined by the primal second-stage program for scenario ω, The recourse functions θ ω (x) and Q(x) are convex, differentiable, and piece-wise linear, characteristics that are exploited in the BD method [1]. These conditions do not hold when integer variables are included in the second-stage program. For the case of integer recourse, a logic-based Benders framework [21], second-stage convexification techniques [22][23][24], specialized branch-and-bound schemes [25,26] or dual decomposition methods [16] may be applied to solve large stochastic problems. In this section, we only focus on Benders decomposition for SP with continuous second-stage variables.
Formulation (BSPp ω ) is a linear program for any given feasible value of x. By the strong duality theorem, the second-stage program is equivalent to its dual (BSPd ω ) if (BSPp ω ) is feasible and bounded. Vector π ω represents the Lagrangian multipliers associated with the second-stage constraints given by Equation (5b): BD introduces a set of piece-wise linear approximations of the recourse function in the problem MP, known as optimality cuts, which are built from dual solutions of the second-stage program. It is important to highlight that the dual feasible region does not depend on the value of x. Thus, the exact representation of the expected cost consists of the computation of all the extreme points of problems (BSPd ω ).
However, the second-stage program may not be feasible for some values of x. BD enforces second-stage constraints (5b) by adding feasibility cuts, which are valid inequalities that exclude infeasible first-stage solutions from the MP. Subproblem feasibility is evaluated by solving the following recourse reformulation for scenario ω: where e ∈ R m 2 is a vector with all-1 entries, and v − ∈ R m 2 is the slack of constraints (5b). The objective function V ω (x) measures the amount by which these constraints are violated; thus, if V ω (x) equals zero, it implies that the original subproblem (5) is feasible. To derive feasibility cuts in terms of x, BD considers the dual of problem (7) to generate an expression equivalent to Equation (7a). The optimal solution µ ∈ R m 2 of the dual feasibility problem (8) corresponds to one of the extreme rays (or directions) of the recourse subproblem (6): The master problem (4) is linearized by (i) substituting function Q(x) with the weighted sum of the future cost estimation (6a), and (ii) applying feasibility cuts as needed. This reformulation is referred to as the multi-cut Benders master problem (BMP), where variables θ ω ∈ R |Ω| represent the outer linearization of the second-stage cost θ ω (x). Parametersπ k ω andμ j represent the extreme points and rays from the dual form of the recourse program (BSPd ω ), which are stored in sets E and K, respectively. Constraints (9c) and (9d) denote the feasibility and optimality cuts, j ∈ E and k ∈ K respectively. Matrices h j and T j correspond to the matrices h ω and T ω for the scenario where a feasibility cut can be found.
The complete enumeration of the extreme points and rays of the dual second-stage program is impractical, if not impossible. Instead, the L-shaped algorithm relaxes the MP by initially considering a subset of the optimality and feasibility cuts. BD iteratively solves the relaxed problem to generate a candidate solution for the first-stage variables (x), and then solves the collection of scenarios subproblems at fixedx to generate a new group of optimality or feasibility cuts. This process is repeated until the optimal solution is found [26].
The optimal solution of the relaxed Benders master problem provides a valid lower estimation (TC d ) of the optimal total cost, TC. On the other hand, the solution of the second-stage programs (BSPd ω ) at feasiblex yields an upper bound of the original objective function (TC p ), given by Equation (10). The solution procedure terminates when the difference between the bounds is closed, as implied by Equation (11). Algorithm 1 summarizes the procedure.
Algorithm 1: Multi-cut Benders decomposition if all subproblems (6) are feasible then 5 ADD new optimality cuts (9d) corresponding to π k ω for all ω ∈ Ω 6 compute TC p from θ(x k ) and x k 7 if TC p < z UB then 8 z UB ←− TC p (upper bound) 9 x * ←− x k 10 else 11 SOLVE (8) to obtain µ for given x j and infeasible scenario j ∈ Ω

12
ADD new feasibility cut (9c) corresponding to µ 13 SOLVE (9) to obtain (x k+1 , θ k+1 ω ) and TC d Set k ←− k + 1 19 return optimal solution x * and z LB The L-shaped method is summarized in Algorithm 1. It is initialized with a guess of the first-stage solution x o and considers two stopping criteria: (i) the optimality tolerance that limits the relative gap between the dual (z LB ) and primal (z UB ) bounds of the objective function (TC), and (ii) the maximum number of allowed iterations (k max ).

Benders Decomposition Enhancement Strategies
The application of BD often leads to slow convergence, long computational times, and excessive use of memory resources, particularly for the case when the MILP master problem has poor LP relaxation [27][28][29]. Major BD disadvantages include time-consuming iterations, poor feasibility and optimality cuts, ineffective initial iterations, primal solutions that behave erratically, slow convergence at the end of the algorithm (tailing-off effect), and upper bounds that remain stuck in successive iterations due to second-stage degeneracy [26,30].
Various strategies have been proposed to accelerate the convergence of the standard BD method. Enhancement strategies are mainly split into two categories: reducing the cost of each iteration, or reducing the number of iterations [26,31].

Reducing the Cost of Each Iteration
Cheaper iterations are achieved by reducing the time spent solving the MP and subproblems. The MP is often the most time-consuming part of the BD algorithm (more than 90% of the execution time) in the case of mixed-integer problems [27]. The overall process can be accelerated by relaxing the integrality of the MP in most of the iterations to rapidly compute a large number of cuts [32]. A variation of this method was proposed by Geoffrion and Graves [28], in which the MP is solved to a non-zero optimality gap. The integrality gap is continuously reduced to ensure global convergence.
Alternatively, the MP might be solved via (meta) heuristics [33,34], which provide good approximate solutions in short time; however, it is still required to solve the MP to optimality to guarantee convergence. The application of heuristics or the LP relaxation of the MP often yields worse bounds and lack of controllability, reducing the ability of BD to generate the necessary cuts [35].
Similarly, suboptimal solutions of the dual subproblems yield valid cuts, known as inexact cuts. Algorithm convergence can still be guaranteed under the conditions described by Zakeri et al. [36]. Additional subproblem acceleration schemes comprise synchronous parallelization and re-optimization. The latter exploits structural similarities between scenarios to solve the subproblems in fewer solver iterations.

Reducing the Number of Iterations
The number of iterations of the L-shaped algorithm is closely related to the tightness of the LP relaxation of the first-stage problem, as well as the strength of the optimality and feasibility cuts [27]. Better candidates are computed from improvements in the MP problem, especially in the strengthening of the representation of the recourse functions. Tighter formulations can be obtained by adding multiple cuts per iteration (multi-cut reformulation [37]); as well as through the use of heuristics to eliminate inactive cuts and to select the fittest dual variables to be inserted in the MP (size management techniques).
Complementary strategies have been developed to generate cuts that are more efficient. One alternative is the reformulation of the subproblems to select non-dominant dual solutions from the set of optimal multipliers, known as Pareto-optimal cuts [27]. Recently, Ref. [38] proposed a methodology to compute bundles of covering cuts, designed to involve most of the first-stage variables and to carry as much information as possible.
Alternative methods tighten the MP to alleviate some of the drawbacks of BD: cross decomposition, for instance, avoids the generation of low-quality solutions, while quadratic stabilization methods provide a solution for the tailing-off effect. Cross decomposition [39] combines and alternates between iterations of BD and Lagrangian decomposition to provide an additional valid lower bound of the objective function and a set of feasible deterministic solutions (x ω , y ω ) ∈ G ω , which are used to compute Lagrangian-based cuts to strengthen the MP.
Quadratic methods have been proposed to stabilize BD, aiming to improve the quality of the initial iterations and reduce the oscillation that occurs when the algorithm is close to the optimal solution [31]. These methods penalize the distance of the first-stage candidates to the current best solution. Popular variants include regularized decomposition [16,40], the trust-region method [41] and level decomposition [42], which are summarized below. It should be noted that Equations (12)- (14) are all variants of the large family of bundle method, which can be seen from chapters XIV and XV of [43].

•
Regularized decomposition (also known as proximal bundle method) • Trust-region method • Level decomposition method where t k , R k and L k are real-valued, iteration-dependent parameters that balance the minimization of the relaxed MP and the distance to the stability centerx k . V k represents the feasible region of the Benders master problem at each iteration, which is defined by the optimality (9d) and feasibility cuts (9c), as well by the first-stage constraints (9b). Stabilization methods were initially introduced for BD with no integer variables; nonetheless, recent improvements have adapted the method to mixed-integer problems [31]. Two software packages using the Benders decomposition algorithm, GAMS-DECIS [44,45] and FORTSP [46] are described in the benchmark in Appendix C.

Scenario Decomposition Methods
Scenario decomposition is a popular approach to solve two-stage SP formulations with mixed-integer recourse, i.e., Y = y : y j ∈ {0, 1}, ∀j ∈ J 1 , y j ≥ 0 ∀j ∈ J\J 1 in (PNAC). In contrast to the BD algorithm, scenario decomposition methods dualize the non-anticipativity constraints (NACs) to obtain lower bounds of the original formulation. Scenario-based decomposition addresses the computational difficulties associated with the solution of large stochastic problems by considering each scenario independently and solving the set of subproblems in parallel. Moreover, feasible solutions to the original problem (P) can be obtained by heuristics based on the optimal solutions of the subproblems. In this section, we describe the dual decomposition (DD) algorithm and the progressive hedging (PH) algorithm.

Dual Decomposition (DD) Method
The dual decomposition algorithm proposed by Carøe and Schultz [17] applies the Lagrangian relaxation to problem (2) and uses a branch-and-bound procedure to restore the non-anticipativity conditions. The Lagrangian relaxation of the NACs results in the following dual function: In Equation (15a), vector λ ∈ R n 1 ×(|Ω|−1) represents the dual multipliers associated with the NACs (2b). λ ω ∈ R n 1 represents the Lagrangian multipliers for the NACs associated with scenario ω, as given by Equation (3). Given the independence of the variables and constraints in each scenario, function D can be split into separate subproblems D ω (λ ω ): According to the weak duality theorem, the relaxation (17) is always less than or equal to the optimal objective value of problem (2). The best lower bound of (PNAC) is computed by solving the following maximization problem, referred to as the Lagrangian dual problem: The Lagrangian dual is a concave non-smooth program and can be solved by subgradient methods, cutting-plane methods, or column-generation methods. The details of these methods can be found in Guignard [47]. We illustrate the dual search approaches by describing the standard cutting-plane algorithm.

Cutting-Plane Method
The cutting-plane algorithm solves the Lagrangian problem iteratively by implementing the outer approximation on (18) and solving the Lagrangian subproblems (17b) to improve the formulation of the relaxed dual function (RZ LD ) in Equation (19a). The outer approximation is given by the Lagrangian master problem (LMP): where parameters for iteration k and scenario ω, x k ω andD k ω (λ k ω ), represent the previous solution of subproblem (17b), and parameter λ k ω represents the vector of the previously considered dual multipliers. The dual search is outlined in Algorithm 2. It should be noted that (19) can be unbounded, especially in the first few iterations. An approach to avoid the unboundedness is to add bounds for the λ ω variables. An alternative way is to use the bundle method [48] instead of Algorithm 2.

Algorithm 2:
Cutting-plane dual search Cutting-plane methods present similar drawbacks as the BD algorithm, such as slow convergence and strong oscillations of the dual variables. This can be explained by the fact that BD is a special case of the cutting-plane method from the view of nonsmooth optimization. Various alternatives have been proposed to accelerate this technique, including the bundle method and the volume algorithm [47]. Additional strategies consider the suboptimal solution of the master problem, using an interior-point method (IPM) in combination with Benders-like cuts to tighten the Lagrangian subproblems (17b) and exclude infeasible first-stage solutions (see [29,49]). Other methodologies such as cross decomposition, exchange information with BD to compute additional cuts derived from feasible first-stage candidates [50].

Branch-and-Bound Method
The DD algorithm proposed by Carøe and Schultz [17] uses the bound Z LD to discard nodes from the first-stage search domain. Algorithm 3 summarizes the branch-and-bound procedure. The set P denotes the group of active problems and TC i , the lower bound associated with program P i ∈ P. Commonly, the Lagrangian dual problem yields firststage solutions that differ in value from one scenario to another. For those instances, a candidatex is estimated by applying a rounding heuristic on the average solution ∑ ω∈Ω τ ω x i ω . Note that Algorithm 3 can be applied not only to problems with mixed-binary variables, but to problems with general mixed-integer variables as well. The branching steps assume that the integer variables can be nonbinary. More, recently Kim and Dandurand [51] proposed a new branching strategy based on Dantzig-Wolfe decomposition that allows reduction in the number of nodes and decreases the overall solution time.

Algorithm 3: DD Branch and Bound method
Input: Branching tolerance of continuous variables Select problem P i from P and SOLVE (18) to get the lower bound Z i LD and x i N AC ←− true 10 else 11 N AC ←− false 12 perform rounding heuristic to obtainx i /* Upper bounding procedure */

13
Compute TC i p fromx i using Equation (10 if not N AC then 21 Select a component x (k) of x and add two new problems to P by adding constraints: Optimal solution x * and z UB

Progressive Hedging (PH) Algorithm
The progressive hedging (PH) algorithm [9] is a popular approximation for solving multi-stage stochastic programs. Although it was initially proposed for convex stochastic problems, it has been successfully applied as a heuristic to solve mixed-integer stochastic programs [52,53].
To find a solution of problem (2), PH aggregates a new set of variablesx (also known as a first-stage policy) that replaces the NACs (2b). Then, it solves the reformulated program (20) using a specialized variant of the alternating direction method of multipliers (ADMM) [54,55]: min Related to dual decomposition, PH relaxes the non-anticipativity restrictions on the first stage. The augmented Lagrangian relaxation L ρ of constraints x ω =x, ∀ω ∈ Ω yields a lower bound D(λ) of the original deterministic formulation (20). The best lower bound is estimated by solving the following problem: and ρ > 0 is a penalty parameter. Constraint ∑ ω∈Ω τ ω λ ω = 0 is required to make L ρ bounded from below. To mitigate the computational difficulties of minimizing the augmented Lagrangian dual function (21b), PH decomposes the problem by scenarios. To achieve the complete separability of subproblems, Rockafellar and Wets [9] proposed to fix the first-stage policy temporarily, and repeatedly solve the program (22) to update the multipliers and the value ofx: Algorithm 4 summarizes the procedure to solve the dual problem (21).

Algorithm 4:
Two-stage progressive hedging algorithm The termination of the algorithm is achieved when the first-stage policy is nonanticipative. In the case of convex instances,x k−→∞ is equivalent to the optimal solution of the deterministic formulation (2), and the convergence is guaranteed. These conditions do not hold for mixed-integer programs; however, a high-quality solution and upper bound can be computed from a non-convergent value of {x k } k=k max and TC p (x k ) k=k max , respectively [52].
Recent investigations have focused on the improvement and acceleration of PH. Various studies identify the penalty term as a critical factor in the quality of the solution and the convergence rate: larger values of ρ can accelerate the convergence but can lead to suboptimal solutions. On the other hand, lower values can improve the quality of the solutions and lower bounds, although with a very slow convergence rate [56]. Numerous alternatives have been developed to circumvent those problems, from per-component and cost-proportional heuristics [52], to the dynamic update of the penalty parameter [42,57].
A limitation in applying PH to stochastic mixed-integer programs is the lack of a lower bound to assess the quality of the computed solution. This disadvantage can be alleviated by estimating a valid lower bound from the non-convergent set of Lagrangian weights λ k [56], or by combining the Frank-Wolfe and PH methods [58]. These methodologies establish a relationship between the dual decomposition and progressive hedging, which has motivated the development of hybrid solution strategies (see [59]).

Software Packages for Scenario Decomposition
In this section, we review two software packages, PySP [52,60] and DSP [49], for scenario decomposition. The two software packages are benchmarked based on the problems in SIPLIB [61], including the SSLP [62], SSLPR [63], and DCAP [64] test problems.
The SSLP test set consists of 12 two-stage stochastic mixed-integer programs arising in stochastic server location problems (SSLPs). The base deterministic server location problem considers building servers in some potential locations to serve clients in given locations. The stochastic version of the server location problem considers different realizations of client locations. Each scenario represents a set of potential clients that do materialize. The decisions in SSLP are all binary variables. In the first stage, we decide whether a server is located at each given location. The second-stage (recourse) actions decide whether any given client is served by any given server. SSLPR (stochastic server location problem with random recourse) is an extension of SSLP. While SSLP assumes fixed demands for the clients, SSLPR considers the demands of the clients as uncertain parameters.
DCAP consists of 12 two-stage stochastic integer programs arising in dynamic capacity acquisition and allocation applications. The deterministic model considers a capacity expansion schedule over T time periods. In each time period, the amount of capacity expansion for each resource needs to be decided. There is a fixed and a variable cost for each capacity expansion. In each time period, each task must be assigned to one of the existing resources, which is represented by binary variables that decide whether a given task is assigned to a given resource. Since there are multiple periods, the stochastic version of this problem should, in principle, be formulated as a multi-stage stochastic program, which is difficult to solve. Ahmed and Garcia [64] proposed to approximate the multi-stage problem with a two-stage stochastic program in which the first-stage decisions are the capacity expansions. The second-stage decisions are the assignment decisions. The uncertainties include the processing requirement for each task and the cost of processing each task.
The sizes of all the test problems are shown in Table 1. The names of the SSLP and SSLPR instances are expressed in the form sslp(rf)_m_n, where m is the number of potential server locations, and n is the number of potential clients. Each instance is tested with a different number of scenarios. The size is expressed as the number of constraints (Rows), variables (Cols), and integer variables (Ints) in the first stage and the second stage per scenario. Note that the SSLP problems have pure binary first-stage variables, and the DCAP problems have mixed-binary first-stage variables. This difference affects the PH algorithm, which is discussed in detail later.
All of the test problems are available in the SMPS format; however, we implement an interpreter to make the format compatible with PySP. All of the tests are run on a server with an Intel Xeon CPU (24 cores) at 2.67 GHz and 128 GB of RAM. The whole set of instances is solved in a synchronous parallel manner to reduce the time of each iteration. PySP is a software package implemented in the Python programming language using Pyomo [65] as the optimization modeling framework. PySP enables the user to solve stochastic programs with a specialized progressive hedging algorithm for stochastic mixedinteger programs. In order to use PySP, the user only needs to write a deterministic base model and define the scenario tree structure in Pyomo. With these inputs, PySP is able to apply the progressive hedging algorithm as an effective heuristic for obtaining feasible solutions to multi-stage stochastic programs. We highlight that starting from distribution 6.0, PySP will not be part of Pyomo's source code; however, it will still be available and maintained. Furthermore, the PySP project will be continued as mpi-sppy, available in https://github.com/Pyomo/mpi-sppy, accessed on 20 January 2022.

Algorithmic Innovations in PySP
The innovations in PySP for multi-stage mixed-integer stochastic programs were described by Watson and Woodruff [52]. Here, we briefly paraphrase those innovations. First, instead of keeping a fixed ρ value for all first-stage decisions in Algorithm 4, the authors proposed several variable-dependent ρ strategies. The cost-proportional (CP) strategy sets ρ(i) to be proportional to the cost parameter c(i), i.e., ρ(i) = αc(i), where α is a constant multiplier for all first-stage variables i. The other variable-dependent ρ strategy was denoted by SEP in [52], where the ρ(i) for integer variables is calculated by After PH iteration 0, for each variable x, x max = max ω∈Ω x 0 ω and x min = min ω∈Ω x 0 ω . For continuous variables, the ρ(i) is calculated with wherex 0 is the weighted average of x 0 ω , i.e.,x 0 = ∑ ω∈Ω τ ω x 0 ω . The authors also proposed some heuristics for accelerating convergence. One heuristic is called "variable fixing". The values of some of the first-stage decisions x ω,i are fixed after they stay at a given value z i for a few iterations for all scenarios. In order to apply this heuristic, the authors introduced a lag parameter µ. At a given PH iteration k, the value of x k ω,i is fixed to z i for all subsequent iterations l > k, if x (k) ω,i = z i for all ω ∈ Ω and m ∈ {k − µ|Ω|, . . . , k}, such that m ≥ µ|Ω|. Additionally, the authors proposed another more aggressive variable fixing heuristic called "variable slamming", where the x k ω is fixed if they are "sufficiently converged", i.e., there can be some discrepancies for x k ω across all scenarios. In order to decide when variable slamming should be applied, the authors proposed several termination criteria based on the deviations of the first-stage variables.
In solving stochastic mixed-integer programs with PH, the cyclic behavior can be found in some instances. In order to detect the cyclic behavior, the authors proposed a strategy based on the values of the u ω vectors, i.e., the weights associated with the firststage decision variable x ω . The authors proposed a simple hashing scheme. Let hash value h(i) = ∑ ω∈Ω z ω u ω,i , where z ω is an integer hash weight for each scenario ω ∈ Ω when the PH is initialized. If equal hash weights are detected, they are interpreted as evidence for a potential cycle. Variable x i can be fixed if cyclic behaviors are found.
The heuristics, including variable fixing and slamming and cyclic behavior detection, are denoted as WW (Watson-Woodruff) heuristics in the software distribution of PySP.

Computational Results for PySP
We use PySP (Pyomo 5.0.0) to solve the SSLP, SSLPR, and DCAP problems. Each subproblem is solved with the CPLEX (12.7.0) quadratic solver. We use the cost-proportional (CP) heuristic to set the values of ρ(i). The multipliers α in the CP heuristic are set to 0.1, 0.3, 0.5, and 1.0, respectively. Note that the main results shown in this section are not using WW heuristics, i.e., we do not use the variable fixing and slamming, or cycle-detection heuristics. We will make a comparison of PySP with WW heuristics and PySP without WW heuristics at the end of this section.
The number of iterations and the walltime for different multipliers are shown in Figures 1 and 2, respectively. If the PH algorithm reaches iteration limit, there is an "(i)" label at the top of the column. If the PH algorithm reaches the time limit, there is a "(t)" label on top of the column. From Figures 1 and 2, one can observe that setting the α value to 0.1 makes PH take the largest number of iterations and largest amount of walltime to converge in most of the instances, which may be due to the small step size. On the other hand, setting α to the largest value, i.e., 1.0, takes fewer iterations and less walltime than using other α values in most instances. However, it runs out of the iteration limit in two of the instances. Overall, setting α to 0.3 seems to be a robust choice because cp-0.3 always converges within a reasonable walltime and number of iterations. The details of the SSLP and SSLPR results are shown in Tables A1 and A2 in Appendix A.
We also apply PySP to solve DCAP instances. We observe that for all the DCAP instances, PySP is unable to converge within 300 iterations. The details of the results are shown in Table A3 in Appendix A where the walltime of the upper bound for those instances is reported.We compare the upper bound obtained by PySP with those obtained by DSP in the next subsection. From this experiment, we can see that it is more difficult for PySP to solve problems with mixed-binary first-stage variables than problems with pure binary first-stage variables because it is more difficult for the continuous variables to satisfy the NACs.
Scenario bundling [66][67][68] is a technique that has been used in dual decomposition algorithms. The main idea is to dualize only "some" of the non-anticipativity constraints, rather than dualizing all of them. In other words, the individual scenario subproblems are bundled into larger subproblems in which the NACs are preserved. Ryan et al. [67] used PH with scenario bundling to solve stochastic unit commitment problems. The authors showed that with the use of scenario bundling, PH can obtain solutions with a smaller optimality gap. In order to test the effectiveness of the scenario bundling, we test several instances from the SSLP and DCAP libraries. The computational results are shown in Table 2. For each instance, we try a different number of bundles. For the SSLP instances, PH with a different number of bundles can obtain the same upper bound. However, the number of bundles has a significant impact on the computational time. For example, for SSLP_10_50 with 1000 scenarios, PH with 50 bundles can reduce the walltime of the original PH with 1000 bundles to 3%. Additionally, it only takes PH with 50 bundles one iteration to converge. For DCAP problems, PH does not converge within 300 iterations for most cases, even with scenario bundling. However, PH is able to obtain better feasible solutions with scenario bundling (see UB in Table 2).  Finally, we evaluate how the use of WW heuristics can affect the performance of PySP on the SSLP and SSLPR libraries. The results on DCAP library are omitted here since PySP does not converge for DCAP instances. The solution time improvements by using WW heuristics for each SSLP and SSLPR instance are shown in Figure 3. Note that there are three cases where the improvements are omitted in the figure: case (1)-neither PH nor PH with WW heuristics can converge in 300 iterations; case (2)-only PH-WW fails to converge in 300 iterations; and case (3)-both PH and PH-WW exceed the time limit of 3 h (10,800 CPU seconds). Using WW heuristics gives significant improvements for small cost-proportional multipliers, i.e., 0.1 and 0.3. As we have described in Table 1, PH with small multipliers usually takes more iterations to converge. Therefore, the WW heuristics can accelerate convergence for those instances, more effectively. However, there are also a few instances where PH can converge, but PH with WW heuristics cannot converge, which are denoted by case (2) in Figure 3.

DSP: Decompositions for Structured Programming
DSP [49] is an open-source software package implemented in C++ that provides different types of dual decomposition algorithms to solve stochastic mixed-integer programs (SMIPs). DSP can take SMPS files, and JuMP models as input via a Julia package Dsp.jl.

Algorithmic Innovations in DSP
From the description of the dual decomposition algorithm in Section 5, one can observe that the lower bounds of the dual decomposition algorithm are affected by the way the Lagrangian multipliers are updated. One advantage of DSP is that the authors have implemented different dual-search methods including the subgradient method, the cutting plane method, and a novel interior-point cutting-plane method for the Lagrangian master problem. The authors observed that if the simplex algorithm is used, the solutions to the Lagrangian master problem can oscillate significantly, especially when the Lagrangian dual function is not well approximated. Therefore, the authors proposed to solve the Lagrangian master problem suboptimally using an interior point method, which follows from the work of Mitchell [69].
The authors also proposed some tightening inequalities that are valid for the Lagrangian subproblems. These valid inequalities, including feasibility and optimality cuts, are obtained from Benders subproblems, where the integrality constraints are relaxed. Computational results show that the Benders-like cuts can be effective in practice. The latest DSP distribution is able to complete Algorithm 3 using the Dantzig-Wolfe decomposition-based branching strategy described in [51].

Computational Results for DSP in Comparison with PySP
We test the dual decomposition algorithm on the SSLP, SSLPR, and DCAP libraries. Each subproblem is solved with CPLEX (12.7.0) mixed-integer linear solver. The interior point method proposed by the authors [49] is used to solve the Lagrangian master problem, which is solved with CPLEX as well. Benders-like cuts are not used because the implementation of Benders cuts in DSP only works with SCIP.
In Figures 4 and 5, we evaluate the best feasible solution (the upper bound) obtained by PySP, and the upper and lower bounds obtained by DSP. For each instance, we include three different gaps. The upper and lower bounds from Ref. [61] are used to evaluate the bounds from PySP and DSP. Note that the bounds from literature are close to the global optimality of each instance. The first column for each instance in both Figures 4 and 5 is the gap between the upper bound from PySP (PySP ub ) and the lower bound from the literature (Literature lb ). The second column represents the gap between the upper bound from DSP (DSP ub ) and the lower bound from the literature (Literature lb ). The third column represents the gap between the upper bound from literature (Literature ub ) and the lower bound from DSP (DSP lb ).
For the SSLP and SSLPR instances shown in Figure 4, although PySP can converge within the time and iteration limit, the best feasible solution obtained from PySP (PySP ub ) may not be optimal. There are about 1% gaps for some of the SSLP instances (see the first column of each instance in Figure 4). DSP can solve more instances to optimality than PySP (see the second column of each instance in Figure 4). The lower bounds obtained by DSP are also quite tight, usually less than 0.01% (see the third column of each instance in Figure 4). Note that the literature values for SSLPRF5_50_100(1), SSLPRF5_50_100(2), and SSLPRF5_50_100(3) do not match the values from our experiment. Therefore, we try to solve the deterministic equivalent of these instances to obtain bounds. The literature bounds of SSLPRF5_50_100(3) come from solving the deterministic equivalent. The gaps of SSLPRF5_50_100(1) and SSLPRF5_50_100(2) are omitted since the corresponding deterministic equivalent cannot be solved within 3 h.
For the DCAP instances where we have mixed-integer first-stage variables, the best feasible solutions from PySP (PySP ub ) and DSP (DSP ub ) are quite far from optimality. The gaps of the first two columns are around 10%. On the other hand, the lower bounds obtained from DSP (DSP lb ) are tight. The gaps between (Literature ub ) and (DSP lb ) are around 0.1%. Therefore, in order to improve the relative optimality gap of DSP, the focus should be on designing more advanced heuristics to obtain better feasible solutions.

Results from Branch-and-Bound Algorithm in DSP
As we were completing this paper, a new version of DSP [51] was developed that uses the branch-and-bound algorithm similar to Algorithm 3. We test the branch-and-bound decomposition on the SLSP, SSLPR, and DCAP test instances. The computational setting is the same as the one described in the previous section. Tables 3-5 show the results obtained by DSP. We highlight that almost all the instances were solved to optimality within the walltime limit.
In all the SSLP instances (except SSLP15_45), DSP using the DW-based branch-andbound (DSP-DW) is able to solve the instances to optimality in times lower than the ones from the DSP implementing only the first iteration of DD. The same trend is replicated on the SSLPRF instances.
On the other hand DSP-DW is remarkably slower in attaining algorithmic convergence in the DCAP test set. This behavior can be explained by looking at the DSP lb vs. Literature ub gap in Figures 4 and 5. The standard implementation of DSP is able to close the optimality gap in the SSLP and SSLPR instances by solving the Lagrangian relaxation problem (15a) and using heuristics to compute a primal bound. Nonetheless, this strategy does not work in the DCAP set for which additional computation is devoted by DSP-DW to close the gap via branch-and-bound.

Conclusions
We presented a summary of the state-of-the-art methodologies for the solution of two-stage linear stochastic problems. First, we introduced the mathematical formulation of such programs and highlighted features in their structure, which enable the development of decomposition algorithms. These methodologies are classified in two groups: node-based decomposition and scenario-based decomposition.
For two-stage stochastic programs with continuous recourse, we summarized Benders decomposition, which partitions the problem according to its time structure. BD may present computational problems, which can be alleviated by reducing the cost of each iteration, and/or decreasing the number of iterations.
Scenario decomposition methodologies are popular alternatives in the case of (mixed) integer recourse. The progressive hedging algorithm and dual decomposition relax the nonanticipatity restrictions and provide the user with valid bounds. Our numerical results show that the performance of PH is strongly affected by the constant penalty multiplier. Furthermore, its performance and the quality of the approximate solution may be enhanced by grouping the scenarios in large bundles (or scenario sets). We also tested the dual decomposition algorithm with the DSP package. The computational results show that DSP is able to provide a tight lower bound on the instances that we tested. However, the optimality gaps can be as large as 10%, relative to the upper bound from literature. Therefore, for those tested instances, future effort should be focused on developing more advanced heuristics to improve the best feasible solution. Moreover, we tested the newest DSP implementation, which incorporates a Dantzig-Wolfe decomposition-based branchand-bound algorithm to close the optimality gap. In most of the instances, it is able to retrieve a global solution in the given walltime limit. We highlight that its computational performance depends on the strength of the dual decomposition.
At the time we completed this paper, several new developments were made in stochastic programming software. These software packages have the capability of solving multistage stochastic programs, e.g., SDDP.jl [70], StochasticPrograms.jl [71], and MSPPy [72]. In the terms of algorithmic development, recent works have extended the capability to solve nonlinear problems [4,[73][74][75][76][77][78]. Furthermore, it should be noted that the paper focuses on the risk-neutral setting of two-stage stochastic programs. However, the studied approaches can be easily extended to the risk-averse setting [79], e.g., by including some risk measures, such as the conditional value at risk (CVar).
Author Contributions: J.J.T. conceptualization, computational study and paper writing; C.L. conceptualization and paper writing; R.M.A. conceptualization and review; I.E.G. conceptualization, supervision and review. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix C. Software Packages for Benders Decomposition
In this section, we review two software packages, GAMS-DECIS [44,45] and FORTSP [46] for Benders decomposition. Both packages are benchmarked with 20 instances from the random [80] and SAPHIR [81] test collections. All of the test problems are available in the SMPS format; however, specific modifications need to be done in order to make the format compatible with DECIS. The computational experiments are performed on a Linux machine with a 2.67 GHz Intel Xeon CPU, 128 GB of RAM, and a limit of 3 h of walltime.
The random collection consists of 15 instances artificially generated with the test problem generator GENSLP [82]. The instances are grouped into three sets of problems (rand0, rand1, rand2), each one of them with five instances with 2000, 4000, 6000, 8000 and 10,000 scenarios. None of the instances represent a real-world problem; nonetheless, they are successfully used to assess the performance of stochastic solvers (see [83]). All problems in this collection present uncertainty only in the right-hand side (RHS) coefficients h ω .
The SAPHIR collection consists of five instances of the optimization of a gas-purchase portfolio, considering the cost of purchase, as well as underground storage capacities and transportation, under uncertain demand conditions [84]. In this family of problems, the random elements are located in both the RHS and constraint matrices W ω and T ω .
The sizes of all of the test problems are shown in Table A7. The size is expressed as the number of constraints (Rows) and variables (Cols) in the first stage and the second stage per scenario. None of the test instances consider integer variables in the first-stage. FortSP is a solver for the solution of linear and mixed-integer linear stochastic programs. It accepts input in the SMPS format, or through a separate SAMPL translator (an AMPL extension for stochastic programming). In addition, FortSP can be used as a library with an application programming interface (API) in C. FortSP enables the user to solve stochastic two-stage linear programs with four variants of Benders decomposition, and provides three different solution approximations for mixed-integer instances.
To solve mixed-integer instances, FortSP uses the deterministic equivalent with both implicit and explicit representations for the NACs. In addition, it incorporates a specialized L-shaped algorithm based on branch-and-cut for instances with mixed-integer variables in the first-stage and continuous and complete recourse. This method might be accelerated with the variable neighborhood decomposition search heuristic (VNDS) [87].
All of the Benders variants in FortSP are formulated in the aggregated form shown in Equation (A1). Disagregated formulations (i.e., problem (9)) store larger information in the master problem, which yields a reduction in the number of iterations. However, this is done at the expense of larger master problems. As a rule of thumb, the disaggregated approach is expected to be more effective when the number of scenarios |Ω| is not significantly larger than the number of constraints m 1 of the first-stage program [1].

. Computational Results for FortSP
We use FortSP to solve the random [80] and SAPHIR [81] test instances. The number of iterations and walltime for different solution methodologies are shown in Table A8, where IPM stands for interior-point method, RD for regularized decomposition, and TR for trust region. The CPLEX (12.5.1) linear and quadratic solver is used to solve the set of master problem and subproblems. For decomposition methodologies, a stopping optimality gap of 1 × 10 −5 is used. FortSP automatically selects the methodology used to solve the set of master problem and recourse instances, from primal and dual simplex, as well as an interior-point method. In addition, FortSP considers the warm-start of linear programs.
From Table A8, one can observe that solving the deterministic equivalent via IPM is an effective alternative, outperforming BD in most of the instances considered; nonetheless, it fails to solve the larger instances in the SAPHIR set. Regularized decomposition and the trust region method perform better than BD in the SAPHIR set, effectively decreasing the number of iterations and the solution time. However, RD fails on the whole set of RAND test problems. Decomposition with the level method presents the best performance on both of the benchmark sets, yielding computational times close to the interior-point method and effectively reducing the number iterations of the standard BD method. DECIS is a software platform for the solution of large-scale two-stage stochastic programs. It accepts problems in SMPS format. To use DECIS in GAMS, the user needs to formulate the deterministic problem and time distribution of the constraints and variables in the GAMS interface, which automatically constructs the core and tim files. The uncertain components and realization probabilities are set from an external stochastic file (.sto extension in SMPS format), which is written by the user. Recent developments in GAMS allow to use the extended mathematical programming (EMP) framework to define a stochastic program for DECIS, as well as setting the optimization of two additional risk measures: value at risk (VaR) and conditional value at risk (CVaR). Appendix C.2.1. Algorithmic Innovations in DECIS DECIS incorporates multiple alternatives to solve linear two-stage stochastic programs, including (i) Benders decomposition with aggregated cuts, and, (ii) a regularized decomposition variant. The latter uses MINOS to solve the quadratic master problem (12), and requires the user to select a proper constant penalty parameter (t k > 0). The overall algorithm performance and convergence are strongly affected by the value of t k .
When the number of realizations is large, DECIS can employ advanced Monte Carlo sampling techniques to compute good approximate solutions. Instead of considering the whole set of possible outcomes to estimate the expected cost, DECIS uses an independent sample drawn from the distribution of random parameters. In addition to crude Monte Carlo sampling, DECIS incorporates importance sampling and control variates, variance reduction techniques which enhance the estimation of the expected cost. In addition, DECIS computes a confidence interval in which the optimal objective function value lies.

Appendix C.2.2. Computational Results for DECIS
We use DECIS to solve the random and SAPHIR test instances. The number of iterations and walltime for different solution methodologies are shown in Table A9. Two initialization strategies are tested on Benders decomposition: (U) where the initial firststage candidate solution is 0, and (EV+U) where BD is employed to solve the EV (expected value) problem. The EV optimal solution is then used as a starting point for the stochastic instance. Iter-EV and Iter-U stand for the number of iterations required to solve the EV and stochastic problem, respectively. A stopping optimality gap of 1 × 10 −5 is considered. DECIS-CPLEX (12.7.0) uses primal simplex in both the MP and subproblems in Benders decomposition. DECIS-MINOS (5.6) is used in the quadratic MP and linear subproblems in regularized decomposition. To exemplify the effects of the constant penalty term on the performance of regularized decomposition, two ρ values, 1 and 10, are tested. From Table A9, it can be observed that regularized decomposition may significantly reduce the number of iterations, and thus the solution time of the overall decomposition algorithm. In addition, stronger penalization might increase the number of iterations, as it restricts the movement of the first-stage candidate to be close to the best incumbent solution. Furthermore, this methodology might present numerical issues, such as bad scaling in the master problem, which makes the algorithm stop without closing the optimality gap. For instance, regularized decomposition fails to solve the whole set of SAPHIR problems.
Using the (EV+U) initialization can accelerate the convergence of Benders decomposition. In 14 of 17 instances where BD converged, (EV+U) had fewer iterations than the (U) strategy, as well as less solution time. The reduction in the iteration number alleviates the time spent computing an appropriate starting point.

Appendix C.2.3. Computational Results for FortSP in Comparison with DECIS
From the results in the previous subsections, it can be observed that the algorithms implemented in FortSP (Table A8) outperforms the decomposition implementations in GAMS-DECIS (Table A9) in terms of solution time. The strength of FortSP resides in the use of multiple strategies that can accelerate the convergence of the standard BD algorithm and regularization solved with MINOS. In fact, we observed that the best FortSP methodology is at least 37.3% faster than the best algorithmic implementation evaluated with DECIS for each test problem (see Figure A1). In the instances in which none of the DECIS solvers converge, the solution time is noted as 3 h of walltime.
As expected, the performance of the BD algorithm in both FortSP and DECIS behaves similarly, having a difference of fewer than 10 iterations in each test instance. Both implementations use BD with aggregated cuts but differ in the initialization procedure. However, the BD algorithm is on average two times faster in the FortSP's implementation than DECIS's implementation. In this particular set of instances, the most time-consuming part of the algorithm is the cumulative solution of scenario subproblems, as can be observed in Figures A2 and A3, which is explained by the large number of scenario subproblems. This difference is especially pronounced in the SAPHIR group, where the recourse problem is larger than the first-stage program, in terms of constraints and variables. In most of the test instances, DE-CIS with initialization in the EV solution is the methodology that spends more time solving the master problem, as it uses BD to obtain a proper starting point. Following the general trend, FortSP is faster in the solution of both the master problem and the subproblems separately, indicating that differences in the implementation play an important role in the performance of the decomposition strategies. Warm-starting and automatic selection of the linear solver might contribute to the acceleration of the convergence of BD in FortSP.