The initial portion of this example is performed with a parallel implementation using the ODE integration solver CVODES and NLP solver SNOPT. Subsequently, further comparison is made using second-order sensitivity analysis for generating the Lagrangian Hessian when using the NLP solver IPOPT with the exact Hessian versus an approximate limited memory quasi-Newton update, which only requires first-order sensitivity information. For this last portion, we currently only report serial solution times of the implementation.
The case study example considered is adapted from [
16] and involves a batch reactor problem in a purely ODE form that follows a first order reaction scheme
, where the kinetic parameters are assumed to be uncertain. The objective is to operate the reactor for an indeterminate duration (
i.e., design variable), such that a maximum profit is achieved. The objective function comprises a revenue term proportional to the product conversion,
, and an operating cost dependent on the duration of operation
. The optimization problem is defined according to Formulation E1,
Figure 2 depicts a base line solution to formulation E1 using a single processor with increasing scenario realizations. For the input and state solution trajectories in
Figure 2a, the solid input and state trajectory lines represent the nominal solution, while the shaded bands represent an envelope of possible solutions generated via discrete realizations of the uncertain parameter values. Interesting aspects to note include: (1) as the number of scenarios is increased, both the optimal objective value (defined here as the ratio of the multi-period objective value to the nominal objective value,
) and parametric degree of freedom,
, converge to a point (or rather confidence interval), which can be considered close to the true solution of the original infinitely dimensional stochastic program; and (2) considering
as the base line, we see a ×2.26, ×4.20, ×8.81 increase in total computation time per major SQP iteration for
, respectively (
i.e., almost a linear increase in computation time as scenario realizations are added). Based on
Figure 2a, an appropriate number of scenarios to use would exceed 80, where the profiles for
and
level off.
Parallel solution times for the total program are reported in
Table 1 for
(number of scenario realizations) and
(number of processors/threads). Additionally, the serial solution time is reported for each scenario realization level and the time required for the nominal dynamic optimization solution (
i.e.,
). We further remark that the parallel solution times are an average of three independent experiments; the NLP problem dimension is represented by the total number of variables
vars and equality/inequality constraints
cons; and the number of NLP iterations until termination is given by
iter. Considering
as the ideal number of scenarios to use, we see a
total computation increase from the nominal solution, and if 16 processors are used (
i.e., the maximum advisable
for the given problem size; see the discussion below), this number drops by 66%, indicating a
increase from the nominal serial solution. Given that we are only parallelizing the discretized implicit DAE integration tasks, a 66% improvement is a promising result. A breakdown of the specific computation performance using speed-up
, where
and
represent serial and parallel program run times, respectively, and efficiency
, is sketched in
Figure 3. Note, for our particular case that we consider each metric to be based on the time to evaluate objective/constraint functionals and derivatives (denoted as DAE time) and exclude the serial in-solver time related to the matrix computations within the NLP solver (denoted as NLP time). From
Figure 3a, the parallel performance in terms of speed-up is quite good up to about eight processors/threads, after which a significant deviation from ideal speed-up is observed. This undesirable behaviour using
, for our chosen problem size of
, can be explained using the laws of Amdahl and Gustafson [
44]. Amdahl’s law gives us an indication of the possible scalability or maximum speed-up for a fixed problem size, while Gustafson’s law can be used to understand the influence of problem size on scalability. Considering first Amdahl’s law, the parallel time can be approximated as
, where
represents an inherent serial fraction of the overall computation, which results in the speed-up expression
, and as
, the maximum possible speed-up is
. Therefore, for our particular example, if the time to evaluate the NLP objective/constraint functionals has an inherent serial portion of 10%, then we would achieve a maximum possible speed-up of 10. Fortunately, if we further consider the influence of problem size
m, whereby the serial fraction of the program is now considered a function of problem size
, it can be shown using Gustafson’s law that speed-up can be given by
, where
, and
and
represent the inherent serial and parallel portions, respectively. Thus, if we are able to better load the processors with more work such that the inherent serial portion diminishes relative to each parallel portion (
), then the fraction
decreases with increasing
m, and as
, the speed-up will approach
. This concept can be better seen using the log-
p model where
and
(see p. 79 of [
44]). The speed-up expression can be derived as
, and if
where
M is the work per processor, then the speed-up (and efficiency) can be controlled by limiting the influence of the
term by increasing
M. Additionally, to ensure a uniform
M on each processor, one needs to properly balance and schedule the distribution of work. For example, in our case study, we found that if the computation time on each processor for a chunk size of
M is relatively constant between processors, then a so-called
OpenMP static scheduling policy is adequate, while if the computation time differs, a dynamic (round-robin) policy is preferred, which is able to better balance the computation load between processors. To achieve good scalability, one often tries to keep the efficiency fixed by increasing the problem size (or rather, work per process,
M) at the same rate as the number of processors/threads
. If this is possible, then the algorithm can be considered weakly scalable; on the other hand, if one is able to keep the efficiency constant for a fixed problem size as
increases, then the algorithm is considered strongly scalable. Based on these definitions of scalability, our particular parallel implementation is not strongly scalable; however, there is enough evidence to suggest weak scalability. For example, from
Figure 3d, the “DAE time” (
i.e., out-of-solver NLP function evaluation time of which the majority represents the parallelized DAE solution) remains relatively constant for a work load of
integration tasks per processor up to about
after which a slight increase in wall-clock time is observed (
i.e., decrease in efficiency), which can be attributed to a greater influence of parallel computation overhead (
i.e., the previously noted
term) relative to the chosen computation load
M.
The next aspect of the study considers assessing the use of forward-over-adjoint second-order sensitivity analysis in order to form a representation of the Lagrangian Hessian. Note, that such a procedure is quite expensive given the numerous forward and reverse sweeps of the integrator for all shooting intervals and scenarios, and the objective here is to provide some insight on the additional cost when compared to a quasi-Newton approximation scheme. For demonstration purposes, we use the interior-point non-linear programming solver
IPOPT-3.11.9 with default options and
MA27,
MC19 for the linear solver and scaling, respectively. Results comparing the limited memory BFGS approximation to the sensitivity approach are reported in
Table 2, where we highlight the total number of primal-dual IPM iterations, total computation time, time spent in the NLP solver, total time to compute the continuity constraint Jacobian using forward sensitivity analysis and additional point constraint first derivatives using AD, which we denote overall as FSA, and total time to compute the lower triangular portion of the Lagrangian Hessian (Equation (8)) via second-order sensitivity analysis (including all AD computations), which we denote as DSOA. From
Table 2, comparing columns
qn and
ex for quasi-Newton and exact Hessian, respectively, we make the following observations: the DSOA approach reduces the overall number of primal-dual iterations (as one would expect); the total computation time increases on average by about
over the quasi-Newton approach where about 98% of the total computation is spent generating the Lagrangian Hessian. From these results, it is quite clear that providing the Lagrangian Hessian of our multi-period NLP formulation by means of second-order sensitivity analysis is very expensive. From an implementation perspective, the computation in each shooting interval could be parallelized; however, this is unlikely to lead to a significant enough decrease in time to justify the use of second-order sensitivities as implemented in our study. An alternative approach proposed by Hannemann and Marquardt [
45] is to use a so-called composite or aggregated approach, which only requires a single second-order sensitivity computation encompassing all shooting intervals. Such a technique has been shown to reduce the Hessian computation time considerably when used in the context of implicit Runge–Kutta integration techniques. Given our adherence to the SUNDIALS solvers in this work, we have not explored this new technique, but it would be the next logical step.