Parallel Solution of Robust Nonlinear Model Predictive Control Problems in Batch Crystallization

: Representing the uncertainties with a set of scenarios, the optimization problem resulting from a robust nonlinear model predictive control (NMPC) strategy at each sampling instance can be viewed as a large-scale stochastic program. This paper solves these optimization problems using the parallel Schur complement method developed to solve stochastic programs on distributed and shared memory machines. The control strategy is illustrated with a case study of a multidimensional unseeded batch crystallization process. For this application, a robust NMPC based on min–max optimization guarantees satisfaction of all state and input constraints for a set of uncertainty realizations, and also provides better robust performance compared with open-loop optimal control, nominal NMPC, and robust NMPC minimizing the expected performance at each sampling instance. The performance of robust NMPC can be improved by generating optimization scenarios using Bayesian inference. With the efﬁcient parallel solver, the solution time of one optimization problem is reduced from 6.7 min to 0.5 min, allowing for real-time application.


Introduction
Nonlinear model predictive control (NMPC) is an advanced control technique based on an online solution of a nonlinear optimal control problem at each sampling instance using new measurements and updated state estimates.The quality of NMPC depends on the accuracy of the underlying model.Despite the high fidelity of using nonlinear models based on first principles, there are still uncertainties associated with external and internal disturbances.Although the inherent robust Input-to-State Stability (ISS) of NMPC can be proven for ideal NMPC [1,2], the assumption that the existence of uncertainties do not change the feasibility (e.g., no state and input constraints) is not valid for many applications.Even if robust stability is valid, it is of limited use in analyzing the robust performance, especially for batch processes.
Several approaches have been proposed to take uncertainty into consideration in the design of NMPC algorithms.The most widely-studied approach is to solve a min-max optimization at each sampling instance to minimize the performance index of the worst-case while satisfying the state and input constraints for a set of uncertainty realizations [3].One concern about this approach is that the nominal performance is sacrificed as the min-max optimization often chooses a very conservative control strategy.Huang et al. [4] proposes to minimize the expected value of the performance index based on multiple uncertainty scenarios.Nagy and Braatz [5] minimizes a weighted sum of expected value and variance of the performance index.While all of these approaches can be implemented within a feedback framework, this feedback is not considered in the NMPC optimization formulation itself.By contrast, Magni et al. [6] optimizes the control laws instead of the control steps at each sampling step.However, if the form of the control law is overly complex, this approach may not be computationally feasible.Recently, several other methods including multi-stage NMPC [7], Riccati differential equations [8] and a relaxation-based approach [9] were reported.
If we represent the uncertainties with a set of scenarios, the multi-scenario-based robust NMPC problem can be viewed as a large-scale stochastic program.The problem size becomes too large to be solved efficiently online by a serial solver, driving the need for parallel algorithms.For stochastic programs, an efficient parallel algorithm often exploits the structure at problem formulation level (e.g., Bender decomposition, Lagrangian decomposition, Lagrangian relaxation, progressive hedging) or at linear algebra level.Although the parallelization of the first class can be easily implemented, the convergence rate is typically slow, especially for nonlinear problems.In contrast, the second class of approaches can retain the fast convergence properties of the original host algorithms.For this class, interior-point methods are popular because the structure of the linear system remains the same at each iteration.The linear systems derived using interior-point methods for stochastic programming problems have the block-bordered-diagonal form, and they can be decomposed using the Schur complement method [10].When the number of first stage variables is small, this approach has almost perfect strong scaling.However, when the number of first stage variables is large, forming and solving the dense Schur complement becomes a computational bottleneck.
In order to deal with stochastic programs with large first-stage dimensionality, many approaches have been proposed.Kang et al. [11] uses a preconditioned conjugate gradient (PCG) procedure to solve the Schur system with an automatic L-BFGS preconditioner.This approach avoids both forming and factorizing the Schur complement explicitly.Lubin et al. [12] forms the Schur system as a by-product of a sparse factorization and factorizes the Schur system in parallel.Cao et al. [13] performs adaptive clustering of scenarios inside-the-solver and forms a sparse compressed representation of the large Karush-Kuhn-Tucker(KKT) system as a preconditioner.The matrix that needs to be factorized in this approach is much smaller than the full-space KKT system and more sparse than the Schur system.
In addition to the parallel solution of the KKT system, a scalable parallel algorithm also requires parallel evaluations of the nonlinear programming (NLP) functions and gradients, and parallel implementations of all other linear algebra operations (e.g., vector-vector operations and matrix-vector multiplications).While the latter is easy for many parallel architectures, the former is not.There is, to the best knowledge of the author, no efficient modeling language supporting parallel evaluations of functions and gradients for general NLP problems.However, for structured problems such as stochastic programs, Kang et al. [11] and Zavala et al. [10] build a single AMPL [14] model instance for each scenario and evaluate all these instances in parallel.Several packages (e.g., PySP [15], StochJuMP [16]) have also been developed to support the parallel evaluation of functions and gradients for structured NLP problems.
This paper solves optimization problems arising from robust NMPC using the parallel algorithm developed to solve stochastic programs.This paper is organized as follows: Section 2 presents both the NMPC and robust NMPC approaches.Section 3 describes one parallel algorithm to solve large-scale stochastic programs based on the Schur complement method.Section 4 illustrates this approach with a case study of a batch crystallization process, and compares the performance of robust NMPC with open-loop control and nominal NMPC.Final conclusions are presented in Section 5.

Problem Formulations
This section demonstrates the problem formulations in the context of batch processes, while the solution strategy described in Section 3 can also be applied to continuous processes.

NMPC Formulation
For a batch process in the interval [t 0 , t f ], the optimal control problem solved online at a sampling instance t k is of the following form: where J is the objective function, t is the time, z(t) is the vector of n z state variables, u denotes the vector of n u input variables, p represents the vector of n p uncertainty parameters, and y(t) is the vector of n y output variables.The initial state values ẑ(t k ) of the process are estimated using moving horizon estimation (MHE) from the historical measurement of y(t), t ∈ [t 0 , t k ].The function f describes the system dynamics and the function g represents the constraints on the inputs and state variables.After solving the above optimal control problem, the input trajectory in the interval [t k , t k+1 ) is injected in the plant.The optimization process is repeated with the updated estimation of ẑ(t k+1 ) at the next sampling instance t k+1 .For batch processes, the objective function usually only depends on the product quality at the end of the process.Therefore, one popular expression of the objective function is: where Π is a weight matrix, and y set is the setpoint.We want the product quality at the end of the batch process to be as close to the setpoint as possible.

MHE Formulation
NMPC requires the initial value of the states ẑ(t k ), but often not all states can be measured.Therefore, we need to estimate those unmeasured state variables from available measurements.At each sampling instance t k , before solving the optimal control problem (1), we solve the state estimation problem of the following form: min p,w(t) where y m (t i ) is the vector of measured values at sampling instance t i , w is the vector of model noise, p re f is the vector of reference value for p, and R, W and Z are weighting matrices.In the objective function, we want the predicted output to fit the measurements, the predicted parameter to be close to the reference, and the model noise to be small.Here, we assume the initial state value at t 0 is available; otherwise, z0 is also a variable and a term penalizing the deviation of z0 from reference should also be included in the objective function.

Robust NMPC Formulation
Despite the high fidelity obtained with nonlinear models based on first principles, there are still uncertainties associated with external and internal disturbances.A decision made without a consideration of these uncertainties might not only result in low-quality products but also carry the risk of violating some safety constraints.In order to deal with the parameter uncertainties, robust NMPC minimizes the expected or worst-case performance.For a batch process controlled by robust NMPC minimizing the expected performance, we solve with the following objective instead of ( 2): where E represents the expected value with respect to uncertain parameters p, and p follows a known distribution on the set P ∈ R n p .
To solve this problem numerically, one method is to assume that p has a finite number of realizations p 1 , ..., p S , with probability ξ 1 , ..., ξ S .S := {1..S} is the scenario set and S is the number of scenarios.With this assumption, the objective function can be formulated as the following: Then, we can derive the following extensive form of the robust NMPC problems and also drop ξ s from the notation by defining Π ← ξ s Π: where z s is a vector of states corresponding to p=p s .The control profile u needs to be determined before the realization of p is known.Hence, we can view u as the first stage variables and z s and y s as the second stage variables.In many cases, the number of possible realizations of p is infinite.To deal with that situation, a number of scenarios are generated using Monte Carlo sampling.Although Equation ( 5) is no longer exact, it is often a good approximation when the number of scenarios is sufficiently large.This method is called the sample average approximation (SAA) method.The optimal value from the extensive form problem (6) converges to that of the original problem with objective function (4) with probability 1 as S → ∞ [17].
If we want to minimize the worst-case performance index instead of expected performance at each sampling instance, we can replace the objective function (6a) with the following equations: min u(t),worst worst, (7a)

Efficient Optimization via the Simultaneous Approach
The above optimization problems are all differential-algebraic equation (DAE) constrained optimization problems.The simultaneous method can be used to reformulate these DAE-constrained problems by discretizing the DAE system using collocation methods [18].As an example of the simultaneous approach, we consider the formulation for robust NMPC with expected performance as the objective.The time domain [t k ,t f ] is partitioned into n e stages with length h i , i = 1, ..., n e , where ∑ ne i=1 h i = t f − t k , while each stage is discretized using n c collocation points.The problem after discretization is of the following form: where w are the coefficients from the Radau collocation method.If we view u i,j as first stage variables, and z i,j s , y i,j s , and żi,j s as second stage variables, the above problem fits the problem formulation of two-stage stochastic programs.

Efficient Parallel Schur Complement Method for Stochastic Programs
The robust NMPC problem formulation discussed in the paper match the structure of stochastic programming problems.A general extensive form of two-stage stochastic programs is of the form: x 0 ≥ 0, (ν 0 ) (9d) where x 0 ∈ R n 0 are the first stage variables, λ 0 ∈ m 0 and ν 0 ∈ n 0 are the dual variables for the first stage equality constraints and the bounds, x s ∈ n s are the second stage variables for scenario s, and λ s ∈ m s and ν s ∈ n s are the dual variables for the second stage equality constraints and the bounds.The total number of variables is n := n 0 + ∑ s∈S n s and the total number of equality constraints is m := m 0 + ∑ s∈S m s .
In our implementation, instead of solving the original stochastic program of the form in (9), we solve the problem (10) by duplicating the first stage variables x 0 as x 0,s , s ∈ S: x 0,1 ≥ 0, (ν 0 ) (10d) where the equality and bound constraints previously applied on x 0 only transfer to that of x 0,1 to prevent redundant constraints.Without Equation (10f), the above formulation can be decomposed into S independent sub-problems.The Lagrangian function of subproblem 1 is defined as and the Lagrangian function for the remaining subproblem s, s ∈ {2..S} is defined as: The Lagrangian of the whole problem (10) can be formulated as: If we use an interior-point method to solve the problem (10), typically the dominant computational cost is the solution of the KKT system.Given the structure of problem (10), the KKT system has the following arrowhead form: where B s := 0 0 0 −I , ∀s ∈ {2..S} x 0,s x s L s .Assuming that all K s are of full rank, we can show with the Schur complement method that the solution of the Equation ( 14) is equivalent to that of the following system: The system ( 16) can be solved in three steps.The first step is to form Z and r Z by adding the contribution from each scenario s.This step requires the factorizations of one sparse matrix K 1 of size n 1 + 2n 0 + m 1 + m 0 and S − 1 sparse matrix K s of size n s + 2n 0 + m s .Besides a total of S factorizations of block matrices, this step also requires a total of (S + 1)n 0 backsolves.The second step is to solve the Equation (16a) to get the direction of first stage variables ∆w 0 .This step requires one factorization and one backsolve of the dense matrix Z.With ∆w 0 , the third step is to compute ∆w s from Equation (16b).This step requires a total of S backsolves of the block sparse matrix.A straightforward implementation of these three steps leads to the explicit Schur complement method.
Using the Schur complement method, both step 1 and step 3 can be easily parallelized.When n 0 is relatively small, the cost of factorizing matrix Z in step 2 is negligible, and the efficiency of parallelizing step 1 and step 3 can be close to one if the size of each block is close to each other.In addition, the memory requirement of the parallel Schur complement method is much smaller for each node than solving the system ( 14) in serial since the information of each block can be stored at each node.
One advantage of using the formulation (10) is that the Schur complement matrix is positive definite (P.D.) if the original KKT system and each K s block satisfies the inertia condition for descent [11,19].This property enables the use of a PCG procedure to solve the Schur system [11], leading to the implicit Schur complement method.This approach avoids both the explicit formation and factorization of the dense Schur complement matrix.Therefore, this approach is more efficient when n 0 is relatively large.
Another advantage of using formulation (10) is that it facilitates the software development process.Equation (15) indicates that the KKT system of the whole problem can be constructed from the Jacobian, Hessian, and function evaluations of subblocks.In other words, the whole model can be constructed by generating one model representation (e.g., AMPL file) for each subblock and setting appropriate suffixes in each model file to identify first stage variables.Therefore, the model evaluation can be performed in parallel.The specialty of formulation (10) is that the Hessian and Jacobian for the subblocks can be used directly.For example, the Jacobian evaluated for subproblem s, s ∈ {2..S}, is ∇ x s ,x 0,s c s (x s , x 0,s ) T = [A T s , T T s ].For the formulation (10), ∇ x s ,x 0,s c s (x s , x 0,s ) T can be used directly in Equation ( 15) without splitting into A T s and T T s and the remaining matrices in Equation ( 15) can be obtained straightforwardly from each model representation.

Performance of Robust NMPC on Batch Crystalization
In this section, we illustrate the performance of an implementation of robust NMPC with a batch crystallization process.

Case Study: Multidimensional Unseeded Batch Crystallization Model
This section describes briefly a multidimensional unseeded batch crystallization model of KH 2 PO 4 -H 2 O system.The details can be found in Mesbah et al. [20], Acevedo and Nagy [21], Cao et al. [22].If we only consider the length L and the width W of crystals, using the population balance model (PBM) and method of moments (MOM), the batch crystallization model can be expressed as the following system of differential algebraic equations: where µ ij is the cross-moment, C is the solute concentration, B is the nucleation rate, G 1 and G 2 are the growth rates along L and W, respectively, S is the relative supersaturation, C s is the saturation concentration, k g 1 , k g 2 , g 1 , g 2 , and k b are kinetic parameters, ρ c is the density of the solution, c, d, and e are polynomial coefficient describing the relationship between saturation concentration and temperature, and k v is a constant volumetric shape factor.The temperature T is the control in this system.Two important indexes of crystals are mean length (ML) and aspect ratio (AR), which can be determined with the following equations: The nominal kinetic parameters are available in Acevedo and Nagy [21], Cao et al. [22], Gunawan et al. [23] and Majumder and Nagy [24].

Numerical Results
The kinetic parameters in this model are subject to large uncertainties.For the purpose of this case study, we assume that k b , b, k g 1 , g 1 , k g 2 and g 2 follow uniform distributions on the interval [3.[1.73 1.75].We also assume that measurements of ML, AR and C are available and the measurement noise corresponding to ML, AR and C follows truncated normal distributions on the interval [−12 12] µm, [−0.2 0.2], and [−0.008 0.008] g/cm 3 .The mean values of the original normal distribution are all zero, and the standard deviations are 6 µm, 0.1, and 0.004 g/cm 3 , respectively.
The setpoint we keep is AR set =2.9 and ML set =200 µm, which is selected using the Pareto front line reported in Cao et al. [22].The following cost function is used as the objective function in the NMPC and to evaluate the performance of a specific test simulation: We assume that the batch process lasts for 90 minutes and there are 18 sampling and control steps.The total number of first stage variables is small enough that the explicit Schur complement method is still efficient.For practical considerations, we also assume the batch process is also subjected to the following constraints so that the temperature profile is within the operation range and certain yield is guaranteed: For the numerical results shown later, T max is 45 • C, T min is 5 • C, R max is 4 • C/min, and C max is 0.237 g/cm 3 .
Table 1 shows the robust performance of different control strategies when exact information is available.For each control strategy, we test the robust performance over 100 scenarios generated from the uncertain parameter distributions, and we will refer to these as test scenarios.For the case of ideal NMPC, we assume that the state of the system is perfectly known, and the controller performance is estimated using exact information from each test scenario.Both the open-loop and nominal NMPC perform the optimization using nominal values for the parameters.For the two robust formulations, exact min-max and exact min-expected, we need to select scenarios for the multi-scenario optimization.We refer to these as optimization scenarios.In Table 1, we show results for the exact case where the optimization scenarios are the same as the test scenarios.Later, in this section (and in Table 2), we will consider the more realistic case when the optimization scenarios are not the same as the test scenarios.While we assume that ideal NMPC knows the exact value of state variables, both nominal NMPC and two robust formulations use MHE to estimate unknown state variables.
Although the ideal NMPC knows the true value of the uncertain parameters, it cannot achieve the setpoint for several test scenarios.The worst-case performance for the ideal NMPC with exact parameters is 499, which is the lower bound of the worst-case performance of all other control strategies.The deviation of the product quality from the setpoint using open-loop control stategy is very large.Because of the feedback mechanism, performance of nominal NMPC improves significantly compared with the open-loop control.By considering uncertainty in the design of NMPC, the performance of exact min-max NMPC is much better than that of nominal NMPC in terms of the average, standard deviation and worst-case cost evaluated by 100 test scenarios.However, the robust NMPC sacrifices the performance when the uncertain parameters are all at their nominal values.Compared with the reduction in the worst-case cost, the nominal cost is still small.It is interesting to observe that the performance of exact min-expected NMPC is much worse than that of exact min-max NMPC and nominal NMPC, even in terms of average cost.The reason is that, although the control minimizes the expected cost at each sampling instance, the optimization formulation does not explicitly consider feedback.One advantage of robust NMPC methods minimizing worst-case or expected performance is that they can fulfill all input and state constraints for all optimization scenarios, which is not guaranteed with nominal NMPC.For this application, although there is constraint violation using nominal NMPC for several test scenarios, the violation is small.Figure 1 shows that the optimal temperature profiles obtained using nominal NMPC and robust NMPC methods.It is clear that the input profiles from three methods are quite different.The results of robust NMPC shown in Table 1 are ideal in that the test scenarios are the same as the the optimization scenarios.We now show results for the more realistic case when they are not the same.Therefore, we generate a new set of optimization scenarios from the uncertain parameter distributions.Table 2 shows the robust performance of robust NMPC using different numbers of optimization scenarios.In theory, increasing the number of optimization scenarios makes the uncertainty distribution considered in the optimization a better approximation of the true uncertainty distribution.Since the number of test scenarios are limited, many other factors (e.g., similarity of optimization scenarios and testing scenarios) also influence the robust performance.For min-expected NMPC, increasing the number of optimization scenarios from 50 to 100 changes the performance slightly.In contrast, increasing the number of optimization scenarios from 50 to 100 significantly improves the performance of min-max NMPC.This shows that min-max NMPC is more sensitive to the number of optimization scenarios.The performance of both robust formulations using a new set of 100 optimization scenarios is close to the performance of using exactly test scenarios as optimization scenarios as shown in Table 1, indicating that 100 optimization scenarios appear to be sufficient for this case study.The size of the problem solved in Table 2 with 100 optimization scenarios is very large.It has 434,219 variables and 434,200 constraints.Table 3 shows the solution time of solving the optimization problem at step t = 0.The total time is composed of both the time constructing the model and the time solving the NLP.The Schur-complement method implemented not only solves the problem in parallel, but also builds and evaluates the model in parallel.It gains 14 times speedup on a computer with 25 cores compared with its own serial implementation.Our solver using a full-factorization method similar to IPOPT takes 6.7 min to solve the problem while the parallel Schur complement solver only requires half a minute, allowing for real-time application of this control strategy.Uncertain parameters can be estimated using MHE.However, in the presence of significant noise and large uncertainties, point estimation results might not be accurate.Nevertheless, we can use Bayesian inference to update the posterior distributions of uncertainties and generate optimization scenarios according to the posterior distribution instead of prior distribution at each sampling instance.Specifically, the posterior distribution is: where f (p) is the prior probability density before y m is observed, f (y m |p) is the probability density of observing y m with a given p, and f (y m ) is the probability density of observing y m , which can be viewed as a constant.For a given p, we can get a corresponding y(p) from simulation.Therefore, f (y m |p) is equivalent to f (y m |y(p)) and can be computed according to the measurement error distribution.With this information, Markov chain Monte Carlo (MCMC) can be used to generate a set of scenarios based on the posterior distribution.Table 4 illustrates that the performance of robust NMPC with Bayesian inference is better than robust NMPC with scenarios generated from the prior distribution alone.This is because the posterior distribution takes the measurements into consideration and therefore is more accurate than the prior distribution.Specifically, the performance of robust min-max NMPC with 25 optimization scenarios from Bayesian inference is close to the ideal performance.Increasing the number of optimization scenarios from 25 to 50 slightly deteriorates the performance because it now considers some scenarios that have very low probability.

Conclusions
In conclusion, this paper solves the optimization problems arising from robust NMPC using the parallel algorithm developed to solve stochastic programs in distributed and shared memory machines.The optimization problem resulting from robust NMPC at each sampling instance can be viewed as a large-scale stochastic program.Using an interior-point method to solve this problem results in a KKT system of the arrowhead form, and these linear systems can be decomposed using the Schur complement method, which can be implemented in parallel.
Using a case study of a multidimensional unseeded batch crystallization process, we show that robust min-max NMPC provides better robust performance compared with open-loop optimal control, nominal NMPC, and robust NMPC minimizing the expected performance at each sampling instance.We further improve the performance by generating optimization scenarios using Bayesian inference.The efficient parallel framework can dramatically reduce both the time to build the model and the time to solve the optimization problem, and thus allows for real time application.

Figure 1 .
Figure 1.Optimal temperature profile for nominal NMPC (nonlinear model predictive control) and robust NMPC.

Table 1 .
The robust performance (value of cost) of different control strategies evaluated using 100 test scenarios and exact information.

Table 2 .
The robust performance of the robust NMPC using different numbers of scenarios evaluated using 100 test scenarios.

Table 3 .
The solution time of solving a robust optimization problem with 100 optimization scenarios.

Table 4 .
Robust performance of min-max NMPC with different numbers of optimization scenarios from Bayesian inference evaluated using 100 simulations.