Multi-Period Dynamic Optimization for Large-Scale Differential-Algebraic Process Models under Uncertainty

: A technique for optimizing large-scale differential-algebraic process models under uncertainty using a parallel embedded model approach is developed in this article. A combined multi-period multiple-shooting discretization scheme is proposed, which creates a signiﬁcant number of independent numerical integration tasks for each shooting interval over all scenario/period realizations. Each independent integration task is able to be solved in parallel as part of the function evaluations within a gradient-based non-linear programming solver. The focus of this paper is on demonstrating potential computation performance improvement when the embedded differential-algebraic equation model solution of the multi-period discretization is implemented in parallel. We assess our parallel dynamic optimization approach on two case studies; the ﬁrst is a benchmark literature problem, while the second is a large-scale air separation problem that considers a robust set-point transition under parametric uncertainty. Results indicate that focusing on the speed-up of the embedded model evaluation can signiﬁcantly decrease the overall computation time; however, as the multi-period formulation grows with increased realizations, the computational burden quickly shifts to the internal computation performed within the non-linear programming algorithm. This highlights the need for further decomposition, structure exploitation and parallelization within the non-linear programming algorithm and is the subject for further investigation.


Introduction
The optimization of process systems under uncertainty is important and, in many cases, necessary for capturing realistic solutions to the optimal operation and design of physical systems.Both external system disturbances (e.g., environmental conditions, feed stream availability, product demands) and inherent internal unknowns (e.g., kinetic parameters, physical and transport properties) within the process necessitate considering uncertainty within the optimization model formulation and solution.This has long been realized, and many computational approaches, specifically from the process systems community, have been proposed (see Geletu and Li [1] for a recent review).In this paper, we are concerned with the solution of dynamic optimization under uncertainty for which, from a design perspective, a number of applications include [2][3][4].In general, optimization under uncertainty can be classified depending on how uncertainty is modelled and fall under two categories: stochastic optimization and robust optimization.In the stochastic optimization approach, uncertain parameters are modelled as random variables with a known or imposed probability distribution.Stochastic objective functionals and possibly constraint functionals are often expressed using a probabilistic representation, and the practical solution implementation requires the conversion from an infinite to finite dimensional formulation.This is often achieved through the use of sampling techniques (to approximate the various probability distributions) followed by a deterministic optimization procedure.Furthermore, to properly quantify the influence of parametric uncertainty on the optimization solution, multiple repeated sampling and optimization solutions are often performed followed by a statistical analysis [5].A popular stochastic optimization approach that has emerged over the last several decades is chance constraint programming (CCP), in which constraints are relaxed according to a particular probability distribution.A key aspect when using chance constraint optimization is efficiently and accurately approximating multivariate integrals associated with the probabilistic terms, and recent work covering dynamic systems is discussed by Arellano-Garcia and Wozny [6] and Kloppel et al. [7].The robust optimization approach, on the other hand, requires no a priori knowledge of the uncertain parameters, and instead, these parameters are assumed to take values from a bounded interval or set.The central idea of this approach is to ensure no constraint violation under all possible realizations within the imposed uncertainty interval.Robust optimization formulations are conveniently posed as min-max problems, where the idea is to minimize the maximum impact of uncertainty on the performance index subject to the largest possible constraint violation (i.e., worst-case analysis).Recent work in this direction is discussed by Diehl et al. [8] and Houska et al. [9], who provide a framework for robust optimal control of dynamic systems.Regardless of the particular uncertainty classification, the conversion of an infinite dimension problem to an implementable finite deterministic non-linear programming formulation is necessary, and one approach to do so is a multi-period or multi-scenario discretization.The approach can serve as a complete solution, such as in a robust model predictive control framework [10,11], or as a component of a more elaborate iterative solution process [2].
In this paper, we are primarily concerned with the use of dynamic process models described by a system of differential-algebraic equations (DAE) and the efficient incorporation of uncertainty using multi-period optimization.This approach can be used to fully or partially address stochastic or robust optimization formulations, depending on how one characterizes uncertainty.Given the widespread use of multi-period formulations, our approach in this work is to focus on the computational aspects and, in particular, the use of dynamically-constrained formulations and their efficient implementation.We further remark that this paper is an extension of our previous work [12], and key contributions of this paper include: (1) the exploration of parallel performance with respect to discretization refinement versus embedded model, which that was not previously considered; (2) the application to a large-scale air separation system using a C/C++ implementation that links several reputable numerical packages; and (3) the assessment of second-order sensitivity generation when using higher-order gradient-based non-linear programming methods.The article is laid out by first discussing the particular optimization formulation addressed and relevant literature pertaining to solution algorithms of such formulations.Next, we provide our solution approach to multi-period problems with embedded DAE functionals, followed by a discussion on our particular implementation.Following this, we give two case studies of different scales to illustrate the computational performance of our approach.Finally, some concluding remarks are provided and future work noted.

Problem Statement
Multi-period optimization is a commonly-used technique to handle uncertainty whereby an infinite dimensional (continuous) formulation is reformulated as a discrete-time problem, such that the continuous uncertainty space is approximated by several points/samples.The formulation considered in this study represents a sample average implementation of a two-stage stochastic program where the objective function is described by a scenario-independent portion, φ 0 (•), and a scenario-dependent portion, φ i (•), that is a weighted sum of uncertain parameter scenarios sampled from a particular distribution or more simply, sampled from some bounded interval [13].Accordingly, we consider the general multi-period non-linear dynamic optimization formulation: In the above formulation, the differential and algebraic states are represented by All of these variables are associated with a particular period/scenario i.The model parameters p ∈ P ⊆ R np are defined uniformly over all scenarios and are often referred to as first stage, scenario independent or complicating variables in the literature.The objective function comprises two terms: φ 0 (p, t f ) : P ×T → R, which represents a scalar scenario-independent portion, and φ i (•) : X ×Z ×D×P ×Γ×T → R, which represents a scalar scenario-dependent portion.The embedded dynamic model is comprised of two separate functionals: which represent the differential and algebraic functions, respectively, of the DAE model in semi-explicit form, assumed to be index-one, such that the Jacobian of f a (•) with respect to z (i) (t) is nonsingular.Furthermore, the DAE functionals are assumed to be sufficiently smooth to ensure the existence and uniqueness of the solution [14].Additionally, The weight (or probability) associated with each scenario i is represented as w i := 1/n s or, more generally, as w i ∈ [0, 1], where n s is the total number of scenarios considered.This particular formulation, where the control variables u (i) (t) are associated with each scenario i, allows for recourse to uncertainty and is in the form of a two-stage stochastic program.The parameters p constitute first-stage decisions, and parameters d (i) and the control inputs u (i) (t) constitute second-stage decisions that can provide a compensatory action in response to disturbance and (uncertain) parameter realizations.Alternatively, the control inputs could be assumed scenario independent, and if we neglect the design parameters, the resulting formulation would resemble a robust optimal control problem.Non-linear programming solution techniques tailored to multi-period formulations have received some attention in the literature.Varvarezos et al. [15] proposed a reduced sequential quadratic programming (rSQP) approach (based on an active-set QP subproblem), which decomposes the multi-period nature through introducing additional linear constraints and scenario-dependent parameters, which effectively removes the potentially non-linear complicating scenario-independent parameters and forms a new non-linear program (NLP) structure, which is easier to solve at the QP level.This decomposed rSQP approach was further explored by Bhatia and Biegler [16], who introduced an interior-point solution technique for each QP subproblem.Ultimately, the resulting interior-point rSQP technique showed superior scalability (with respect to scenario realizations) compared to the active-set rSQP approach.A similar interior-point SQP approach has been used on discretized dynamic optimization formulations, which result in highly-structured NLP formulations (see [17,18]).More recently, Zavala et al. [19] have demonstrated a parallel primal-dual interior-point approach to tackle discretized multi-period dynamic optimization formulations.This has ultimately led to general interior-point approaches to handle discretized nominal dynamic optimization formulations [20] and structured NLP formulations [21].All of these previously noted studies have been on structured NLP techniques involving explicit objective and constraint functionals; however, our particular interest in the present paper is on solution techniques involving implicit or embedded functionals, which require a secondary solution algorithm for evaluation within the NLP constraints.These types of formulations arise in shooting approaches to dynamic optimization and require the solution of an embedded differential-algebraic system in order to fully evaluate the NLP functions.In conjunction with the multi-period approach to uncertainty, very little has been demonstrated in the literature on shooting-based multi-period dynamic optimization.It is the central goal of this paper to demonstrate this approach and, in particular, the benefits towards the overall optimization approach of evaluating the embedded dynamic system in a parallel manner.

Proposed Solution Approach
Our proposed solution approach to Problem P1 uses a combined multi-period multiple-shooting discretization, whereby the embedded DAE model integration tasks are solved in parallel.The main contribution of this article is the assessment of such an approach when applied to reasonably large differential-algebraic process models for design under uncertainty with recourse and, alternatively, robust optimal control.The main difference in our proposed approach and corresponding implementation, from other multiple-shooting approaches presented in the literature [22,23], is that we have incorporated an additional layer of parallelization in terms of the individual scenarios used within the multi-period approach.

Multi-Period Multiple-Shooting Discretization
The multi-period multiple-shooting approach discretizes a continuous non-linear uncertain dynamic optimization formulation to an algebraic non-linear program (NLP) with an embedded DAE model within the constraints.The technique entails introducing new optimization parameters (ξ (i) 0,j , µ (i) 0,j ) for all periods/scenarios i to represent the differential and algebraic state variable initial conditions at the beginning of each shooting interval j, hence the zero subscript, and new equality constraints to remove the discrepancy or defect between the differential state variable values at the final time from the previous interval and the initial time in the current interval.
This idea is sketched in Figure 1 for scenario realization i, where we assume that the time intervals used in the input control trajectory parameterization (i.e., parameterized by the υ (i) j ) correspond directly to the defined shooting grid.Our current approach has been to use a rather straight-forward implementation of the multiple-shooting technique, whereby we provide the discretized formulation in full space to an existing sparse NLP solver, as opposed to exploiting the structure of the formulation by implementing a custom (possibly reduced-space) non-linear programming technique (cf. the SCPgen code generation solver within CasADi [24] or the non-linear optimal control solver MUSCOD-II, which uses a tailored reduced SQP algorithm [25]).
A general multi-period dynamic optimization formulation that utilizes the multiple-shooting discretization applicable to semi-explicit differential-algebraic equation models can be written as, min w, d, p In the above formulation, j = 0, . . ., n − 1 represents n shooting intervals.The optimization parameters are partitioned into scenario-independent parameters p, scenario-dependent parameters d = {d (i) } ns i=1 ∈ R n d ns and shooting node parameters for differential/algebraic states and input variable parameters, defined collectively as w = (ξ , for all shooting nodes and scenario realizations.The continuous control input vector can be defined using a parameterized function u (i) (t) = U(t, υ (i) j ) based on a piecewise approximation within each shooting interval I i,j , where υ is used in defining the final end point constraint in Problem P2 for notational simplicity, where υ n−1 , and can be removed from the NLP (see Figure 1 for a sketch of the parameterization).Additionally, we consider here a fixed end-time formulation where the objective function is represented in Mayer form, which typically only directly depends on the final model states ξ (i) 0,n , µ (i) 0,n , parameters d (i) and p and, possibly, the final time t n .The relaxed DAE model, is embedded within the NLP function evaluations and is solved using an appropriate DAE solver for t ∈ I i,j , j = 0, . . ., n − 1 with initial differential state conditions x (i) (t j ) = ξ (i) 0,j and algebraic state conditions z (i) (t j ) = µ (i) 0,j for all intervals I i,j , where the intervals are effectively decoupled using the new parameters and are thus independent of each other.The particular formulation given here relies on the use of a relaxed form of the DAEs using a so-called relaxation function represented here through the function ϑ(γ ) is functionally dependent on the shooting parameters at node j.This relaxed form also requires the addition of point equality constraints (for the algebraic model equations) in the NLP at each shooting node to ensure that the original model is obtained upon NLP convergence (see [25,26] for a more complete treatment of DAE relaxation).It is worth noting that using the relaxed DAE approach avoids the otherwise necessary DAE initialization problem.
To simplify the formulation and to assist in our presentation, the multiple-shooting continuity equality constraints, including the initial conditions at t 0 , can be defined as, where w j,i = (ξ ) ∈ R nx+nz+nυ for j = 0, . . ., n − 1, and at the final shooting node, w n,i = (ξ The algebraic consistency equality constraints and remaining inequality constraints represent NLP point constraints and can be defined at each shooting node according to, The combined constraint vector for each period/scenario can now be stated as, The fully-discretized combined multi-period multiple-shooting NLP formulation of Problem P2 can now be stated in parameterized form according to Problem P3. Depending on the constraint type (equality or inequality), the vectors c L i and c U i are appropriately defined.
For each scenario i, the concatenated shooting node parameters are defined as Note that the embedded DAE model F(•) is removed from the NLP formulation, as it is solved using an embedded DAE solver in order to construct the shooting node continuity constraints.The multiple-shooting approach benefits from a naturally decoupled temporal domain structure that does not require any additional decomposition techniques.This decoupling of each shooting interval is induced by the introduction of the optimization parameters ξ (i) 0,j , µ (i) 0,j for j = 0, . . ., n−1 and i = 1, . . ., n s and allows the embedded DAE in each shooting interval and scenario realization to be independently solved in parallel.Accordingly, all scenario realizations i and shooting intervals j over the entire time horizon result in m = n • n s independent integration tasks, which can be broken up and solved in parallel using several processors.

First-Order Derivative Generation
Shooting-based dynamic optimization approaches necessitate the use of DAE parameter sensitivity in order to generate derivatives of constraints involving implicit functionals.For example, a relaxed semi-explicit index-one parameterized DAE system can be stated as, where all time-invariant parameters within each interval j and scenario i are denoted by j , d i , p}.From this system, the linear first-order forward sensitivity equations can be derived (in matrix form) as, where s {d,a} (t) = ∂{x (i) (t), z (i) (t)}/∂y j,i represents differential and algebraic sensitivity variables, respectively, and f {x,z,y} {d,a} (t) = ∂f {d,a} (x (i) (t), z (i) (t), y j,i , t)/∂{x (i) (t), z (i) (t), y j,i } are Jacobian matrices of the DAE model with respect to the differential variables, algebraic variables and parameters.This extended linear DAE system is solved forward in time alongside the original system to generate {d,a} (t j+1 ), which are used to construct the block structured continuity constraint Jacobian (see, [12]).Particularly efficient techniques and software tools to solve the combined systems of Equations ( 4) and ( 6) are discussed by Maly and Petzold [27], Feehery et al. [28], Schlegel et al. [29] and Kristensen et al. [30].

Second-Order Derivative Generation
Sequential quadratic programming (SQP) algorithms (e.g., filterSQP) or primal-dual non-linear interior-point methods (IPM) (e.g., IPOPT, KNITRO) can often utilize second-order derivatives of the objective/constraint functions, which are used to construct the Lagrangian Hessian (used in the QP subproblem or primal-dual search direction linear solve).To provide such information when using embedded DAE models (i.e., implicit functionals) requires that a second-order sensitivity analysis be performed to construct an approximate representation of the continuity constraint Hessian.All other contributing portions of the Lagrangian Hessian (i.e., explicit objective and point constraint functionals) can be computed exactly using automatic differentiation.A question that arises, and that we seek to address, is whether supplying the second-order derivatives via sensitivity analysis can reduce the number of non-linear programming algorithm iterations sufficiently to justify the additional computation work of second-order sensitivity analysis.
The complete Lagrangian Hessian for our particular multi-period formulation can be stated as, where the italicized symbol x represents a composite vector of all primal NLP variables (as distinct from the model states given by x(t)) and ν = {ν c s,j,i , ν q l,j,i } is a similar concatenation of all dual variables.More specifically, ν c s,j,i are equality constraint multipliers related to the continuity constraints in Equation ( 1), represented individually here by c s,j,i (x), and ν q l,j,i are either equality or inequality constraint multipliers related to the point constraints in Equation ( 2), which are again defined individually as q l,j,i (x).In order to compute the individual Hessian portions related to the continuity constraints, ν c s,j,i ∇ 2 xx c s,j,i (x), a direct second-order sensitivity analysis can be performed on the portion of the Lagrangian involving the embedded functionals.For example, consider the continuity constraint Lagrangian portion as, where the second portion of this term can be taken as a scalar point-wise implicit functional for each scenario and shooting interval and defined accordingly as, Using the sensitivity generation approach described by Ozyurt and Barton [31], the directional Hessian of this point-wise functional can be determined using a forward-over-adjoint direct second-order sensitivity analysis.The particular purpose in this paper is to investigate the application of this technique in the context of a multi-period multiple-shooting algorithm.Furthermore, the application considered in Section 4.1 is in the form an ODE; thus, we restrict our presentation of second-order sensitivity analysis to the purely ODE case.Accordingly, we consider the ODE system given by, For the more general semi-explicit index-one DAE case, we refer readers to the work by Cao et al. [32] for first-order methods, Hannemann-Tamas [33] for higher order methods and Albersmeyer [34] for higher order relaxed DAE methods.The final form of the directional Hessian of a point-wise functional, in the context of our multi-period approach, can be stated as, where ⊗ represents the Kronecker product; λ (i) (t j ) ∈ R nx is a vector of first-order adjoint variables at t j for scenario i; µ (i) (t j ) ≡ λ y (t j )u ∈ R nx is a vector of directional second-order adjoint variables at y (t j+1 )u ∈ R nx is a vector of directional first-order forward sensitivity variables at t j+1 (i.e., the solution of directional first-order forward sensitivity equations); q (i) (t j ) ∈ R ny is a vector of directional second-order adjoint quadrature variables; x (i) yy (t j )u ≡ 0 nxny is a vector of directional second-order forward sensitivity variables initially known at t j , while x (i) is a matrix of first-order forward sensitivity variables initially known at t j ; g yy (t j+1 ) ≡ 0 ny×ny and g yx (t j+1 ) ≡ 0 ny×nx are second-order derivatives of the scalar functional g j+1,i evaluated directly at t j+1 , which for the multiple-shooting continuity constraints, are simply matrices of zeros.In order to determine the first-order and directional second-order adjoint variables, one needs to first solve forward from t j to t j+1 the first-order forward sensitivity equations to compute the directional sensitivities s (i) (t j+1 ) and then solve backward from t j+1 to t j for each direction u ≡ e l , l = 1, . . ., n y , the combined first-and second-order directional adjoint system given by, Additionally, alongside the adjoint system, the quadrature variable vector q (i) (t j ) can be determined from the system, where f x (t) = ∂f (x (i) , y j,i , t)/∂x (i) (t) ∈ R nx×nx and f xy (t) = ∂ 2 f (x (i) , y j,i , t)/∂x (i) (t)∂y j,i ∈ R nxnx×ny in which this latter term is in the form of a stacked Hessian to avoid the otherwise tensor form.Similarly, with all other derivatives defined in an analogous manner.The evaluation of both adjoint and quadrature systems requires the efficient evaluation of several matrix-vector products, comprised of first and second derivative terms, using an appropriate automatic differentiation (AD) tool.Once the adjoint and quadrature variables are determined at t j , the directional Hessian given by Equation ( 13) can be formed.This process is repeated for all j = 0, . . ., n − 1 and i = 1, . . ., n s , and the Lagrangian Hessian ∇ 2 xx L c (x, ν c ) is assembled (based on each direction that corresponds to a particular parameter) and further combined with the objective and point constraint Hessian.

Implementation Details
The results in this paper were generated using a C/C++ implementation, which acts to coordinate the user model input, multi-period multiple-shooting discretization and interaction between several available DAE integration and NLP optimization routines.The implementation utilizes several CSparse routines [35] and interfaces the integration routines CVODES and IDAS from the SUNDIALS suite of solvers [36], the NLP solvers SNOPT [37] and IPOPT [38] and the AD tool ADOL-C [39].The particular implementation aspect we investigate in this paper is an OpenMP loop parallelization of the high-level DAE integration tasks, and we sketch the approximate solution steps according to Algorithms 1 and 2. Note that the sensitivity generation approach employed by the SUNDIALS integrators follows a so-called "first-differentiate-then-discretize" methodology (i.e., the first-and second-order sensitivity equations are formed prior to applying the discretized numerical integration routine); an alternative approach is a so-called "first-discretize-then-differentiate" technique, whereby certain aspects of the internally-discretized integration algorithm are differentiated either via AD or through numerical differences during the integration procedure.This latter approach is generally known as internal numerical differentiation (IND) and has shown greater solution accuracies and speeds when used in conjunction with embedded model shooting-based dynamic optimization schemes [40][41][42].Despite the merits of this newer approach, for the ease of availability through existing solvers, we have followed the first approach in this study.
Input: initial primal and dual variable guesses and tolerances 1: generate scenario realizations: }, and dual variables ν [0]   3: provide optimality (tolkkt) and feasibility (tolfeas) tolerances Output: primal/dual solution x * , ν * to a local minimum of the NLP satisfying tolerances 4: procedure {x * , ν * } ← NLP_SOLVE(x [0] , ν [0] , tol {kkt, feas} ) 5: initial eval of objective/constraints and 1st derivatives (gradient, Jacobian) 7: : : initial Lagrangian Hessian approximately (or eval exactly via DSOA_SOLVE(x [0] , ν 11: repeat until termination criteria satisfied 12: check KKTconditions (and other termination criteria) 13: compute search direction of primal/dual variables (d x , d [k] ν ) via QP solver 14: compute step size α [k] via a line search (requires objective/constraint eval) 15: perform step: re-evaluate function derivatives ∀ j, i (used to construct the next QP) update Hessian approximately (or eval exactly via DSOA_SOLVE(x [k] , ν 19: end 20: end procedure Algorithm 1 sketches a high-level SQP-type solution procedure for the NLP given by Problem P3.The purpose of outlining the NLP solution approach is to provide some insight to where specifically within the algorithm an embedded DAE solver (and possibly a first-and/or second-order sensitivity solution) is required.Alternatively, one could utilize a non-linear interior-point approach where the major differences from Algorithm 1 are an adaptive barrier update strategy that nests a procedure similar to Steps 11 to 18 where the Newton search direction is determined from a single solution of the primal-dual equations as opposed to Step 13 shown here, which solves the QP to optimality (see [43] for the details).For the particular algorithm shown, initially, a complete specification of the scenario realization set, initial primal (and possibly dual) variables and termination tolerances is provided.Note that an initial simulation of the embedded DAE can be used to determine initial feasible guesses for the primal shooting variables ξ (i) 0,j and µ (i) 0,j for all j and i, given υ (i) j , d and p.The dual variables can be initialized at zero (cold start) or warm started if a previous similar NLP solution is available.Following this, an iterative quadratic programming procedure is performed whereby: (1) objective, constraint and derivative functions are evaluated; (2) termination criteria are checked (and possibly termination signalled); (3) a search direction is determined from a quadratic program (using either an active-set or primal-dual interior-point method) (this step is often preceded by a dimensionality reduction of the original QP; additionally, infeasible QP's are handled in a so-called feasibility restoration phase); (4) a globalization procedure is performed to determine the step size (we note a line search, but a trust-region approach within the QP itself is possible); and (5) primal and dual variables are updated and objective, constraint and derivative functions are re-evaluated.The time-dominant aspect of the algorithm occurs with the embedded implicit function evaluations denoted by DAE_SOLVE, which we handle specifically within our implementation by Algorithm 2. Additionally, note that during the step size globalization procedure, re-evaluation of objective and constraint functions is required, and for the multiple-shooting continuity constraints, sensitivity generation is deactivated within DAE_SOLVE.We remark that an algorithm for the procedure DSOA_SOLVE follows in a similar manner to Algorithm 2, whereby a second-order forward-over-adjoint sensitivity analysis is performed for each shooting interval j and scenario i, which is specifically handled by the SUNDIALS integration solvers.for i := 1 to ns do in parallel using OpenMP for k = 1, . . ., n • ns tasks 6: for j := 0 to n − 1 do

7:
set initial differential and algebraic DAE variables: set initial differential DAE sensitivity variables: 9: solve DAE and 1st order sensitivity system {x (i) (tj+1), s Algorithm 2 computes the solution of the discretized relaxed embedded DAE (both state and first-order sensitivity variables over each interval and scenario), which is used to evaluate the continuity constraints and associated Jacobian at each major iteration of the NLP algorithm.Note that using the relaxed DAE approach, the initial conditions of the differential and algebraic states are by formulation always consistent at t j .For each shooting node and scenario realization, the embedded DAE is initialized in Step 7 of Algorithm 2 by the NLP parameters; next, in Step 8, the initial values of the differential sensitivity variables (in matrix form) are set to an augmented identity matrix.The next step is to solve the DAE and first-order sensitivity system using an appropriate DAE solver with efficient methods for handling the sensitivity equation solution.The last step (not shown) is to construct the continuity equations and Jacobian, the latter of which is a matrix with appropriately positioned blocks of sensitivity variables at t j+1 (see [12]).Several remarks are warranted with respect to Algorithm 2: (1) when using an implicit integration routine, such as IDAS from SUNDIALS, one needs to additionally provide initial values for the time-derivatives of the differential states ẋ(i) (t j ), the initial time-derivatives of the differential sensitivity variables ṡ(i) d (t j ) and the initial algebraic sensitivity variables s (i) a (t j ); (2) the differential time-derivatives are determined from a single evaluation of the ODE portion of the DAE; and (3) the differential time-derivative sensitivity variables and algebraic sensitivity variables are computed from a linear solve of the initial sensitivity equations (note that this requires a single initial factorization and several back solves using multiple right-hand-side vectors for a linear system given by A X = B).We elaborate on this last point.For example, given that variables x (i) (t j ), z (i) (t j ), and s (i) d (t j ) are known at t j , we can rearrange the initial first-order sensitivity equation system as, and solve for the initial values of ṡ(i) d (t j ) and s (i) a (t j ) (in matrix form).The particular details of the underlying ODE/DAE or NLP solution used in this paper can be found in the previously noted references.However, we remark on the following: both first-and second-order sensitivity analysis is handled directly by the SUNDIALS integrators, and all the user needs is an appropriate AD tool to form Equations ( 6) and ( 13) to (15); for our implementation, we used the tape-based features of ADOL-C for all serial computation and the tapeless forward differentiation features when evaluation is needed within parallel OpenMP regions of the code (we refer readers to the ADOL-C manual for the particular details of tapeless and tape-based operator overloaded AD).We further note that all code and third party libraries used for our example problems were compiled using GCC-4.7 (with OpenMP-3.1)and run using 64-bit Linux.The hardware was an HP Proliant computing server configured with four sockets using AMD Opteron 6386SE series chips (16 cores per chip) at 2.8 GHz, which provide a total of 64 available cores (processors/threads). Furthermore, an appropriate amount of memory was allocated/utilized to suit the requirements of the program.

Example Problems
We demonstrate our proposed parallel multi-period dynamic optimization approach using a batch reactor problem and a large-scale air separation problem.The objective is to assess the computational performance (resource utilization efficiency and scalability) of the proposed method when the number of scenarios and shooting intervals (embedded integration tasks), model size and available computing processors are increased.

Batch Reactor Problem
The initial portion of this example is performed with a parallel implementation using the ODE integration solver CVODES and NLP solver SNOPT.Subsequently, further comparison is made using second-order sensitivity analysis for generating the Lagrangian Hessian when using the NLP solver IPOPT with the exact Hessian versus an approximate limited memory quasi-Newton update, which only requires first-order sensitivity information.For this last portion, we currently only report serial solution times of the implementation.
The case study example considered is adapted from [16] and involves a batch reactor problem in a purely ODE form that follows a first order reaction scheme A → B, where the kinetic parameters are assumed to be uncertain.The objective is to operate the reactor for an indeterminate duration (i.e., design variable), such that a maximum profit is achieved.The objective function comprises a revenue term proportional to the product conversion, x B , and an operating cost dependent on the duration of operation t f .The optimization problem is defined according to Formulation E1, st: The initial state conditions are taken as x A0 = 1, x B0 = 0 ∀ i, where we use a normalized time horizon, such that the end-time t f is taken as a design parameter.The cost/objective coefficients are set as c 0 = 700, c 1 = 50, c 2 = 2; and the weights are set as w i = 1/n s .The parameterized control profile is taken to be piecewise constant, and initial guesses for the polynomial coefficients are set to 1.0 for all scenarios and shooting intervals.For this example, we kept the number of shooting intervals constant at n = 25 and with a uniform size over the time horizon.The uncertain model parameters (θ 1 , θ 2 ) were determined by sampling uniformly between the defined bounds.
Figure 2 depicts a base line solution to formulation E1 using a single processor with increasing scenario realizations.For the input and state solution trajectories in Figure 2a, the solid input and state trajectory lines represent the nominal solution, while the shaded bands represent an envelope of possible solutions generated via discrete realizations of the uncertain parameter values.Interesting aspects to note include: (1) as the number of scenarios is increased, both the optimal objective value (defined here as the ratio of the multi-period objective value to the nominal objective value, J/ J) and parametric degree of freedom, t f , converge to a point (or rather confidence interval), which can be considered close to the true solution of the original infinitely dimensional stochastic program; and (2) considering n s = 40 as the base line, we see a ×2.26, ×4.20, ×8.81 increase in total computation time per major SQP iteration for n s = 80, 160, 320, respectively (i.e., almost a linear increase in computation time as scenario realizations are added).Based on Figure 2a, an appropriate number of scenarios to use would exceed 80, where the profiles for J/ J and t f level off.Parallel solution times for the total program are reported in Table 1 for n s = 40, 80, 160, 320 (number of scenario realizations) and n p = 4, 8, 16, 32 (number of processors/threads).Additionally, the serial solution time is reported for each scenario realization level and the time required for the nominal dynamic optimization solution (i.e., n s = 1).We further remark that the parallel solution times are an average of three independent experiments; the NLP problem dimension is represented by the total number of variables vars and equality/inequality constraints cons; and the number of NLP iterations until termination is given by iter.Considering n s = 80 as the ideal number of scenarios to use, we see a ×116 total computation increase from the nominal solution, and if 16 processors are used (i.e., the maximum advisable n p for the given problem size; see the discussion below), this number drops by 66%, indicating a ×39 increase from the nominal serial solution.Given that we are only parallelizing the discretized implicit DAE integration tasks, a 66% improvement is a promising result.A breakdown of the specific computation performance using speed-up S = T serial /T parallel , where T serial and T parallel represent serial and parallel program run times, respectively, and efficiency E = S/n p , is sketched in Figure 3. Note, for our particular case that we consider each metric to be based on the time to evaluate objective/constraint functionals and derivatives (denoted as DAE time) and exclude the serial in-solver time related to the matrix computations within the NLP solver (denoted as NLP time).From Figure 3a, the parallel performance in terms of speed-up is quite good up to about eight processors/threads, after which a significant deviation from ideal speed-up is observed.This undesirable behaviour using n p ≥ 16, for our chosen problem size of m = n • n s , can be explained using the laws of Amdahl and Gustafson [44].Amdahl's law gives us an indication of the possible scalability or maximum speed-up for a fixed problem size, while Gustafson's law can be used to understand the influence of problem size on scalability.Considering first Amdahl's law, the parallel time can be approximated as T parallel := f T serial + (1 − f )T serial /n p , where f represents an inherent serial fraction of the overall computation, which results in the speed-up expression S(n p ) = (f + (1 − f )/n p ) −1 , and as n p → ∞, the maximum possible speed-up is S(∞) = f −1 .Therefore, for our particular example, if the time to evaluate the NLP objective/constraint functionals has an inherent serial portion of 10%, then we would achieve a maximum possible speed-up of 10.Fortunately, if we further consider the influence of problem size m, whereby the serial fraction of the program is now considered a function of problem size f (m), it can be shown using Gustafson's law that speed-up can be given by S(m, n , where f (m) := a(m)/(a(m) + b(m)), and a(m) and b(m) represent the inherent serial and parallel portions, respectively.Thus, if we are able to better load the processors with more work such that the inherent serial portion diminishes relative to each parallel portion (b(m) a(m)), then the fraction f (m) decreases with increasing m, and as m → ∞, the speed-up will approach n p .This concept can be better seen using the log-p model where T parallel := T serial /n p + log 2 (n p ) and T serial ∝ m (see p. 79 of [44]).The speed-up expression can be derived as S(m, n p ) = n p /(1 + (n p /m) log 2 (n p )), and if m = M n p where M is the work per processor, then the speed-up (and efficiency) can be controlled by limiting the influence of the log 2 (n p ) term by increasing M .Additionally, to ensure a uniform M on each processor, one needs to properly balance and schedule the distribution of work.For example, in our case study, we found that if the computation time on each processor for a chunk size of M is relatively constant between processors, then a so-called OpenMP static scheduling policy is adequate, while if the computation time differs, a dynamic (round-robin) policy is preferred, which is able to better balance the computation load between processors.To achieve good scalability, one often tries to keep the efficiency fixed by increasing the problem size (or rather, work per process, M ) at the same rate as the number of processors/threads n p .If this is possible, then the algorithm can be considered weakly scalable; on the other hand, if one is able to keep the efficiency constant for a fixed problem size as n p increases, then the algorithm is considered strongly scalable.Based on these definitions of scalability, our particular parallel implementation is not strongly scalable; however, there is enough evidence to suggest weak scalability.For example, from Figure 3d, the "DAE time" (i.e., out-of-solver NLP function evaluation time of which the majority represents the parallelized DAE solution) remains relatively constant for a work load of M = 250 integration tasks per processor up to about n p = 16 after which a slight increase in wall-clock time is observed (i.e., decrease in efficiency), which can be attributed to a greater influence of parallel computation overhead (i.e., the previously noted log 2 (n p ) term) relative to the chosen computation load M .The next aspect of the study considers assessing the use of forward-over-adjoint second-order sensitivity analysis in order to form a representation of the Lagrangian Hessian.Note, that such a procedure is quite expensive given the numerous forward and reverse sweeps of the integrator for all shooting intervals and scenarios, and the objective here is to provide some insight on the additional cost when compared to a quasi-Newton approximation scheme.For demonstration purposes, we use the interior-point non-linear programming solver IPOPT-3.11.9 with default options and MA27, MC19 for the linear solver and scaling, respectively.Results comparing the limited memory BFGS approximation to the sensitivity approach are reported in Table 2, where we highlight the total number of primal-dual IPM iterations, total computation time, time spent in the NLP solver, total time to compute the continuity constraint Jacobian using forward sensitivity analysis and additional point constraint first derivatives using AD, which we denote overall as FSA, and total time to compute the lower triangular portion of the Lagrangian Hessian (Equation ( 8)) via second-order sensitivity analysis (including all AD computations), which we denote as DSOA.From Table 2, comparing columns qn and ex for quasi-Newton and exact Hessian, respectively, we make the following observations: the DSOA approach reduces the overall number of primal-dual iterations (as one would expect); the total computation time increases on average by about ×25 over the quasi-Newton approach where about 98% of the total computation is spent generating the Lagrangian Hessian.From these results, it is quite clear that providing the Lagrangian Hessian of our multi-period NLP formulation by means of second-order sensitivity analysis is very expensive.From an implementation perspective, the computation in each shooting interval could be parallelized; however, this is unlikely to lead to a significant enough decrease in time to justify the use of second-order sensitivities as implemented in our study.An alternative approach proposed by Hannemann and Marquardt [45] is to use a so-called composite or aggregated approach, which only requires a single second-order sensitivity computation encompassing all shooting intervals.Such a technique has been shown to reduce the Hessian computation time considerably when used in the context of implicit Runge-Kutta integration techniques.
Given our adherence to the SUNDIALS solvers in this work, we have not explored this new technique, but it would be the next logical step.

Air Separation Problem
This next case study explores further the influence of processor loading on algorithm scalability and additionally the influence of DAE model size.A large-scale DAE air separation model is used, which considers the separation of nitrogen from air.The model used here was adapted from Cao [46], and a simplified process schematic is shown in Figure 4.As a first step, air enters from the atmosphere and is compressed using a multi-staged compressor (COM); impurities are then removed using several adsorption units; high pressure purified air is then cooled in a multi-path heat exchanger (PHX) using the returning gas product (GN2) and waste streams from a high pressure distillation column (HPC); the cooled air stream is then split where a portion goes through a turbine (EXP) to promote further cooling before entering the bottom of the distillation column, while the other stream goes directly to the distillation column.The air entering the column is converted into high purity gaseous nitrogen, which exits at the top, and crude liquid oxygen, which accumulates at the bottom.A portion of the high purity nitrogen gas is drawn off as product (V N2 ), while the remainder is fed to an integrated reboiler/condenser (IRC) to exchange heat with a crude oxygen stream, which is drawn from the bottom of the column.The heat exchange converts gaseous nitrogen to liquid, which is then refluxed back to the top of the column and optionally drawn off as liquid nitrogen product (LN2).
The portion of the process we focus on in this study is the distillation column (HPC) and integrated reboiler/condenser (IRC) units, with the air feed to the bottom of the distillation column (F 1 ) as the input stream.A detailed listing of the variables and equations used in our particular model can be found in [46].The optimization formulation considered seeks to determine a robust control profile to transition from one steady state to another while satisfying path inequality constraints on product composition (defined in the form of product impurity y O2 ) and tray flooding (defined implicitly by ensuring the vapour velocity ν nt is below the flooding velocity νnt on the critical tray of n t , which represents the top tray of the column), all while under the influence of uncertainty within key model parameters.The formulation can be defined as a continuous multi-period dynamic optimization problem according to the following equations, st: DAE model (125 ODEs, 329 AEs) x (i) (t 0 ) − x ss = 0 (initial steady state) where y(t) = V N2 (t) and y sp = V sp N2 are the measured output and corresponding final desired set-point for the nitrogen production rate; u(t) = F 1 (t) is the manipulated input feed rate of air to the first tray (i.e., column bottom); θ = [η, ∆P ] is a vector of uncertain model parameters, which we select a priori as the tray efficiency η ∈ [0.4,0.6] and pressure drop between trays ∆P ∈ [0.45, 0.55] kPa.Note the stated DAE model dimension of n x = 125 differential and n z = 329 algebraic variables/equations excludes all sub-expression algebraic variables/equations; thus, the algebraic portion of the model is in fact quite larger than it might appear.Select output and state solution trajectories of Formulation E2 are plotted in Figure 5 for variables F1 (t) ≡ F 1 (t)/F 1 (t 0 ) and VN2 (t) ≡ V N2 (t)/V N2 (t 0 ) and path constrained variables y O2 (t) and α(t), where α(t) = ν nt (t)/ν nt (t) ≤ 1.A piecewise linear input control profile was selected, and the rate of change of this profile was penalized in the optimization objective function.Note that in order to prevent unnecessary chattering of the control input in the latter portion of the time horizon, the profile uses an evenly-distributed parameterization of n − 1 shooting intervals within the first 0.5 h of the time horizon and a final single interval within the remaining 1.5 h.From Figure 5, we see that the production rate of nitrogen vapour VN2 (t) and the vapour velocity ratio α(t) both increase in proportion to the feed air input F1 (t) (as one would logically expect); however, due to the flooding constraint, there is a clear limitation on the rate of production increase, and the influence of parametric uncertainty within the model directly affects the onset of constraint activation.Ultimately, we are able to establish an optimal control profile that is robust to the prescribed uncertainties within the model and adherent to the constraints within the formulation.In order to establish a performance base line, we consider a fixed number of shooting intervals, three increasing scenario sizes and two processors.For shooting intervals n = 6 and scenarios n s = 20, 40, 80, the average total solution time per major SQP iteration was approximately 1.37 × 10 3 s (22.7 min), 3.22 × 10 3 s (53.6 min) and 8.12 × 10 3 s (135.4 min), respectively; which are about ×15.53, ×36.68 and ×92.51greater than the nominal solution time of 1.46 min.The corresponding ratio of time spent in the NLP solver versus the DAE solver was 0.14, 0.30 and 0.88, respectively, which indicates that as the problem size increases, the computational burden shifts dramatically from the DAE solver to the in-solver aspects of the NLP solver (e.g., active-set determination, matrix factorizations, matrix-matrix/matrix-vector multiplications, etc.)

HPC
With the base line solution properties established, we now turn to assessing the potential computation speed-up via our parallel multi-period approach.Table 3 lists the optimization problem size and solution results, in terms of the number of SQP iterations and total wall-clock times, for an incremental number of integration tasks m = n • n s (where n = {6, 12} and n s = {20, 40, 80}) using an increasing number of computing processors.Considering first a problem with n = 6 and increasing scenario realizations (n s = {20, 40, 80}), we observed good scaling properties using at most n p ≤ 32.For example, using n p = 16, we observe an overall average computation speed-up of ×3.57, ×2.88, ×1.93 for each n s , respectively; and for n p = 32, ×4.54, ×3.24, ×2.07,where the decrease in rate of speed-up from 16 to 32 processors is due to the increasing serial portion of the NLP solver.If we remove this large serial NLP portion, it can be confirmed that the parallel implementation of the embedded DAE shooting intervals can be done fairly efficiently.For example, Figure 6a provides speed-up trends for n p from two to 64 at a fixed amount of work (m) based on the so-called "DAE time" as defined in the previous case study.It is evident that good, strong scaling properties are observed up to about 32 processors, after which a more sharply decreasing rate is observed, which can be attributed in part to an insufficient work load per processor M .Figure 6b compares the amount of time, per major SQP iteration, spent in the DAE solver versus the NLP solver.Considering a problem size of m = 480 and imposing an increase in processors from two to 64, we see a significant reduction in DAE solution time relative to the serial NLP solution time.Furthermore, from Figure 6c, if we increase the number of processors in proportion to the problem size m, the DAE solution time remains relatively constant, which indicates reasonably good weak-scaling properties.For the case of n = 12 (see Figure 6d,e), where we effectively double the NLP size, better speed-up with increasing number of processors is observed, which indicates the expected result that as we increase the work per processor (M ), we also reduce the relative parallel overhead and ultimately see better, strong scaling properties.However, for the particular active-set SQP solver used in this study, we quickly run into significant serial computation overhead within the QP subproblems (i.e., a sharp increase in the number of QP iterates).
Finally, we consider the aspect of increasing the embedded DAE size and provide a relative comparison on the influence of DAE size on the overall parallel scalability of the implementation.Table 4 compares model sizes of n x /n z = {23/57, 59/153, 125/329} where n x and n z represent the number of differential and algebraic state variables, respectively, that are a result of increasing the number of distillation column trays according to n t = {5, 17, 39}.The base line total solution time per SQP iteration for each model size using n s = 80 scenarios was 6.24 min, 20.23 min and 135.37 min, respectively, which are over 100-times the nominal solution time.Through parallelization, these base line times can be reduced by about ×3.6, ×2.4 and ×1.9, respectively; where again, we observe that as the serial NLP portion grows, the potential speed-up diminishes.Figure 7 reveals more closely the speed-up and efficiency excluding the influence of the serial in-solver NLP contribution.As the model size is increased (i.e., more expensive integration tasks), a more pronounced improvement is observed when compared to increasing the number of integration tasks per processor via discretization alone (i.e., increasing M via increasing n s or n).In other words, if one compares the delta in speed-up from n x /n z = 59/153 and n x /n z = 125/329 in Figure 7a to the delta in speed-up from n s = 40 and n s = 80 in Figure 6a, then a larger deviation results from increasing the model size versus the discretization refinement.This result is particularly positive and highlights that one is able to more easily achieve better parallel scalability using larger embedded DAE models versus creating more independent integration tasks.

Concluding Remarks
In this paper, we have presented a parallel computing approach for large-scale dynamic optimization under uncertainty that targets the decomposition of the embedded differential-algebraic equation model.A combined multi-period multiple-shooting approach was used to discretize the DAE optimization formulation to yield a multi-period NLP formulation with embedded implicit DAE functionals within the constraints.The DAE model and sensitivity equations corresponding to each shooting interval and scenario constitute independent integration tasks, well suited for parallel processing.Our multi-period approach was applied to a large-scale DAE air separation model comprising up to 125 ODEs and 329 algebraic equations for the purpose of obtaining a robust optimal control profile subject to uncertainty in the model parameters.Results indicated fairly good parallel scalability using a parallel OpenMP implementation of the DAE solution; however, the extent of scalability depends largely on the amount of work per processor and on the ability to effectively balance the work load between processors.In this paper, we were able to demonstrate the benefits of parallelizing the DAE solution portion of the multiple-shooting algorithm; however, as the NLP size grows with scenario realizations, the computation bottleneck shifts to the NLP solver.While it is possible to alleviate some of the computation burden

Figure 2 .
Figure 2. Case Study 1: (a) control input and state trajectories (nominal solution represented by the solid line) and (b) base line DAE and NLP solution times for increasing n s .

Figure 3 .
Figure 3. Case Study 1: speed-up, efficiency and wall-clock times for increasing n s .

Figure 5 .
Figure 5. Case Study 2: (a) robust control and select output trajectories (nominal solution represented by the solid line) and (b) base line DAE and NLP solution times for increasing n s .

Figure 6 .
Figure 6.Case Study 2: speed-up and wall-clock times for increasing n p and n s , where n = 6 fixed for (a) to (c) and n = 12 fixed for (d) to (f).