Recurrent Neural Network-based Model Predictive Control for Continuous Pharmaceutical Manufacturing

The pharmaceutical industry has witnessed exponential growth in transforming operations towards continuous manufacturing to effectively achieve increased profitability, reduced waste, and extended product range. Model Predictive Control (MPC) can be applied for enabling this vision, in providing superior regulation of critical quality attributes. For MPC, obtaining a workable model is of fundamental importance, especially in the presence of complex reaction kinetics and process dynamics. Whilst physics-based models are desirable, it is not always practical to obtain one effective and fit-for-purpose model. Instead, within industry, data-driven system-identification approaches have been found to be useful and widely deployed in MPC solutions. In this work, we demonstrated the applicability of Recurrent Neural Networks (RNNs) for MPC applications in continuous pharmaceutical manufacturing. We have shown that RNNs are especially well-suited for modeling dynamical systems due to their mathematical structure and satisfactory closed-loop control performance can be yielded for MPC in continuous pharmaceutical manufacturing.


| INTRODUCTION
The pharmaceutical industry has a growing interest in addressing key challenges such as the reduction of waste, cost, footprint and lead-times by implementing the concept of continuous manufacturing [1,2]. As a result, with the development in continuous manufacturing technologies and quality by design paradigm, there is a trend of increasing demand for more advanced model identification and process control strategies in the continuous pharmaceutical industry [1].
One emerging application area for continuous manufacturing is to extend the palette of permissible reaction conditions and enable reaction outcomes that are quite challenging when performed under batch conditions [3, 4, 5].
The production of Active Pharmaceutical Ingredients (APIs) is the core part of pharmaceutical manufacturing. As a result, many researchers have proposed or employed continuous manufacturing technologies in the production processes of complex APIs. The fact is that these reactions are generally both reversible and coupled with many side reactions, thereby leading to a highly non-linear nature of the reaction system. As a result, this imposes difficulty on model identification. In addition to that, the need to operate in tight and extreme conditions leads to needs in effective and sophisticated control strategies [6,7,8]. The control and model identification of these continuous API reactions are particularly challenging due to the complexity of the underlying phenomena and the associated ultimate impact on reaction yields, molecular structure, downstream processing and Critical Quality Attributes (CQAs) of final product.
Model-based advanced control technology is viewed to be vital for enabling continuous pharmaceutical manufacturing by improving control of CQAs. The system identification is an indispensable part of the control performance for the whole process. Many rigorous models have been proposed to describe different API reactions [6, 9,10]. These physics-based models have a clear physical interpretation. However, they can suffer from a complicated structure and may incur excessive computational cost during on-line control. Moreover, in many nonlinear systems, it is extremely difficult and expensive to obtain an accurate model of the process from first principles [11]. In contrast, experimental, data-driven heuristic models are easier to get and with relatively simple mathematical descriptions [12]. Belonging to the data-driven methods, neural networks have been applied actively in chemical and biochemical processing industry, especially has been implementing in some complex non-linear processes where process understanding is limited [13,14,15,16].
System identification in chemical engineering applications addresses temporal correlations and interactions amongst the various states and controlled variables of concern. It is generally recognized that Artificial Neural Networks (ANNs) have the universal capability of approximating any function. Compared to the typical feed-forward ANNs, the structure of RNNs is such that the latter are better suited for providing temporal predictions. RNNs are inherently structured to use the hidden variables as a memory to capture long-term dependencies and have self-feedback of neurons between and within the hidden layers. This feedback process means RNNs inherently possess a "deep" structure.
RNNs have been widely used in sequence learning problems including scene labeling [17] and language processing [18], with good success. The authors of [19] provide an excellent overview of the applications of neural networks especially from a process systems engineering perspective. Thus, in our work, we study the implementation of a RNN for model identification, using a complex reaction in a single Continuous-Stirred Tank Reactor (CSTR) for pharmaceutical API production, as an exemplary reference.
Apart from the modeling of complex pharmaceutical reactions, how to design effective model-based control algorithms for continuous manufacturing is also a research focus in academic literature [20] and industry. In addition, the United States Food and Drug Administration (FDA) has emphasized that, in order to take full advantage of the real-time release Process Analytical Technology (PAT) concept, continuous-flow reactions require on-line monitoring and feedback control [21]. Model-based control is an excellent framework to rigorously incorporate the information of new and existing PAT technologies, as they emerge or get developed.
The MPC method is widely used in industrial applications [22,23,24] and it has been applied in control of CQAs in continuous pharmaceutical manufacturing [10,20]. In [10], the authors presented two plant-wide MPC designs for whole continuous pharmaceutical manufacturing pilot plant from chemical synthesis to tablets formation using the quadratic dynamic matrix control algorithm, an earlier form of MPC. It shows by monitoring CQAs in real time, Critical Process Parameters (CPPs) can be actively manipulated to enable more robust and flexible process operation via feedback control. The advantages of MPC compared to conventional Proportional-Integral-Derivative (PID) controllers are highlighted in [20] via comparison of the control strategies to a feeding blending unit used in continuous pharmaceutical manufacturing. In addition to the work done on MPC, RNNs had been proposed to be used in conjunction with MPC in previous studies. For example, [25] investigated the applicability of RNNs for control of non-linear dynamical systems.
RNNs have been used for predictive control of CSTRs (for instance [26]) but these have been limited to relatively basic kinetics (e.g., single-step, irreversible kinetics A → B). Furthermore, these MPC studies have rarely been applied to the most challenging and essential part in continuous pharmaceutical manufacturing of reactor control.
Our contribution is in demonstrating the applicability of a special class of neural networks, termed RNNs, for MPC applications relevant to relatively complex continuous pharmaceutical manufacturing reaction applications. This paper is organized as follows. Section 2 describes the plant model for system identification and case studies for control closed loop performance assessment. Section 3.1 explains the RNN-based system identification methodology. Section 3.2 contains the MPC formulation. Results and discussions are shown in Section 4.

| PLANT MODEL AND CONTROL CASE STUDIES
The following subsections describe (i) the true plant model for the purpose of simulation and system identification and (ii) case studies in the case of closed-loop control.

| Plant Simulator
We consider the following reaction system in the context of a single CSTR, as was previously studied by [27,28]. Distinct from the literature, our work focuses on system identification using RNNs with a view towards closed-loop control via MPC. Referring to Eq.1, the feed species is A, the desired product is the intermediate R with species S being an undesired byproduct. The reactions are reversible and depending on the value of the manipulated variable(s).
(1) F I G U R E 1 Steady-state conditions as a function of system temperature (with feed flow rate, q = 0.8) The dynamic equations, based on normalized, dimensionless quantities are presented as such: Here, C j , j ∈ {A, R, S } refers to the concentration of the respective species within the reactor and form the state vector. The reactions are first-order, reversible with Arrhenius-type temperature dependencies. It is assumed that all concentrations are measured. The manipulated variables (MVs) are feed flow rate (q ) and reactor temperature (T ), whose values are also available. Here, it is assumed that the only species entering the CSTR are A and R such that their concentrations sum up to unity. As such, the concentration of species S is easily computed as a function of time. The values of the normalized feed concentrations, Arrhenius pre-exponentials and activation energies can be found in [28] and are reproduced in the Appendix for ease of reference.
As shown in Fig. 1, for a given flow rate, the concentration of the desired product, C R reaches a maximum at a certain temperature, beyond which the reaction is driven all the way back to the left of Eq.1, yielding only feed reactant and no desired product, R . This problem possesses rich and relevant dynamics for continuous pharmaceutical manufacturing and is therefore worth detailed investigation. The control problem is challenging due to the existence of input multiplicities [27], [29]. In the sequel, we work in the discrete-time domain with time steps denoted by k ∈ + , as is common in digital control. Furthermore, we make the following definitions, The underlying TA B L E 1 Steady-state conditions (nominal flow rate, q , equals 0.8) where k ∈ {0, 1, 2, . . . , } is the discrete-time index, x k the state variables, u k , the manipulated variables. Full state feedback is assumed such that the measured output y k equals the state vector at all time steps. Φ(.) is the statetransition equation that is consistent with Eq.2 through numerical integration of the Ordinary Differential Equations (ODEs).
The problem of system identification as described in Section 3.1 is to find a representative approximation of Eq.3. In this paper, we assume that the p−step ahead prediction problem is of vital interest for the purpose of MPC. A sampling time, ∆t of 0.1 (units of time) is used in this paper.

| Closed-Loop Control Case Studies
The approach taken in constructing a RNN for the purpose of MPC is to learn the RNN model with training data, and validate against previously unseen validation/ test data. However, given that the end goal is control, the final evaluation would be in terms of closed-loop performance for the two scenarios, relative to a benchmark/ reference Non-linear MPC (NMPC) approach that uses the true plant model (see ODEs in Eq.2) as the internal controller model for on-line control.
These scenarios are carefully selected and the corresponding conditions are described in Table 1. These exemplary scenarios are firstly, a (i) reactor system start-up case and secondly, (ii) an upset-recovery case where the controller needs to recover from an external upset. In the start-up scenario (case I), the system is assumed to be at an initial condition corresponding to a low temperature state that corresponds to relatively low product concentration. In the recovery scenario (case II), the system is assumed to be at an initial condition that corresponds to a region of relatively high temperature state and low yield.
The initial condition of the system corresponding to Case I is located on the left of peak C R , as demarcated by an orange point in Fig. 2. The initial condition of the system corresponding to Case II is located to the right of the peak, as demarcated by the green point in Fig. 2. The set-point is judiciously chosen to be at a location where the ratio of product concentration to feed concentration is maximum and as is at a product concentration level that is only slightly lower than what is maximally achievable. The rationale is that this operating point maximizes yield whilst lowering downstream separations cost. Table 1 contains the initial steady-state values of the system as well as the target set-points.
F I G U R E 2 Starting conditions relative to set-point for case I (start-up) and case II (recovery) 3 | METHODOLOGIES

| Non-linear Time-Series System Identification via Recurrent Neural Networks
In our case, an MPC based on a single linear model would result in poor control performance due to the presence of input multiplicities corresponding to the change in gain. Thus, there is a need of a fit non-linear model for good closed-loop performance of the associated NMPC controller.
For clarity of exposition, consider a single RNN cell as follows (see Eq.4). At time step k , the hidden state variable of the RNN is the output of the cell and is denoted by h k ; the input to the node is defined as u k −1 1 .
where σ(.) is an activation function applied element-wise, W u,h is the input-to-hidden-state weight matrix; W h,h is the hidden-state-to-hidden-state weight matrix and b h is a bias (offset) vector. The dimensionality of h and therefore that of W u,h , W h,h and b h depend on the number of nodes of the RNN cell. This single RNN cell can be unfolded across time-steps, as in Fig. 3 so that the temporal dependencies are clearly illustrated. Each cell may output directly to one or more measured observation vectors or output into one or more RNN layers above it (see Fig. 4). RNNs may be multi-layered where each subsequent layer captures more nuanced features than the prior layer.
At time step k , in order to make a p-step ahead prediction of the output variables,ŷ k +p |k , a RNN, N(.), is trained for the purpose of modeling the non-linear system dynamics. The corresponding regressors required to estimate this quantity is defined to be φ k := [y k , u k , u k +1 , . . . , u k +p−1 ]. Note that the first element of the regressors is the output measurement at time-step k ; this is done as a way to provide some form of initialization for the hidden state at the first instance.
The input sequences are introduced in a vertical, layered structure per Fig. 4. For MPC with a prediction horizon of p-steps, it is noted that the following equation is deployed as shorthand to describe the internal RNN-MPC model: where the( .) symbol makes explicit that an estimate is being computed.
The various weights are estimated by Back-Propagation Through Time (BPTT), which involves an application of the chain rule in the face of minimizing a certain loss function that is related to the prediction error of the model. In our study, the 'Mean Absolute Error' is adopted as the loss function. We employ the Adaptive Moment Estimation (ADAM) optimization algorithm [30], a popular stochastic gradient descent-like method with widely applicability, to learn the parameters and weights of the RNN.
The parameters associated with simple RNNs cells are known to be difficult to train due to the gradient-vanishing and gradient-exploding numerical problems. To resolve this, special RNN cell structures such as Long Short-Term Memory (LSTM) cells have been developed to address this issue. In the following cases, we used LSTMs to be the cell structure of interest. Details of the LSTM are left to the Appendix for interested readers.

| Control Problem Formulation
The fundamental strategy underpinning MPC is in the selection of a set of future control moves (spanning a control horizon of m steps) that minimizes a certain cost function based on the desired output trajectory over a prediction horizon (spanning p steps). MPC requires a reasonably accurate model, that captures the essential dynamics of the plant under control, so as to predict dynamic behavior multiple-steps ahead.
The following Receding Horizon Control (RHC) problem (Eq.6) is solved via a mathematical optimization program Here, vector y * refers to the set-points of [C * A , C * R ] . ∆u k +j u k +j − u k +j −1 , refers to the discrete-time rate of change of the manipulated variables, where it is assumed that there is no change in actuator position beyond the control horizon. That is ∆u k +m = ∆u k +m+1 , . . . = 0, k ∈ + . Q y ∈ n y ×n y and Q u ∈ nu ×nu are square, symmetric, positive semi-definite weighting matrices that serve to penalize deviations from set-point and excessive actuator movement in a quadratic sense, as is typically used in MPC formulations. Constraints are imposed on both the absolute value of the manipulated variable as well as the rate of change, via respective upper and lower bounds, as shown in Eqs.(8)-(9).
The aforementioned optimization problem does not, in general possess special structures amenable to global optimality (e.g., convexity). This is a Non-Linear Programming (NLP) problem, which can be solved by modern, off-theshelf solvers 2 . At each time-step the optimal sequence (vector) of future actuator movements {∆u * k , . . . , ∆u * k +m−1 } are computed whereupon the first element, ∆u * k , is implemented and maintained until the next sampling instant, k + 1. Thereafter, the optimization is repeated in a moving horizon fashion.
F I G U R E 5 Training data for system identification (Manipulated Variable; sampling rate, ∆ t = 0.1)

| RESULTS AND DISCUSSION
The procedure for investigating the RNN-MPC approach as a practical control strategy is according to the following steps: (1) Gathering process operating data with perturbation in all the manipulated variables to generate a rich-enough data set. (2) Training RNN models with various parameters using training sets and validating them with test data. Select the one with best performance that can capture most of the behavior of the plant.

| System Identification Results
Using Eq.2 to simulate the true plant behavior, a profile of the manipulated variables per Fig. 5 was used to generate the necessary training data for the RNN model. Reactor temperature was stepped up (past the point of inflexion) and then back down for various choices of flow rates as can be seen in Fig. 5. The corresponding output profile for RNN can be found in Fig. 6. We varied the number of nodes between 250 to 2, 000 and number of layers to be between 1 and 3. Table   2 summarizes the performance of the various RNN parameters over previously unseen test (validation) data. System identification performance over N data points is quantified by Root Mean Square Error (RMSE).
Generally speaking, the prediction performance is found to improve with increasing number of nodes. This is because more expressive features are created when increasing number of nodes to better fit the nonlinear dynamic system behavior, thereby reducing prediction bias.
Furthermore, a deep, hierarchical model may be much more efficient in representing some functions than a shallow one [31]. We observed this when the number of layers were increased from 1 to 2. However, performance deteriorated when the number of layers were increased from 2 to 3. This is due to the fact that RNNs that are too deeply stacked may be difficult to learn, without extensive tuning potentially with heuristics and/ or specially developed algorithms. This is potentially due to the aforementioned gradient vanishing numerical problems that arise during back-propagation. This issue is attributable to the complex structure of multi-layered RNN.
As can be seen in Table 2, using 1, 000 nodes per RNN (LSTM) cell with 2 hidden RNN layers was found to yield the best prediction performance over previously unseen test data. Consequently, we opted to proceed in our system identification studies, and corresponding closed-loop control studies, by constraining the number of RNN layers to 2.
For completeness, we also proceeded to learn an RNN model with 2 layers and 2, 000 nodes. We noticed that over-fitting occurred, as reflected in Table3. The corresponding prediction performance for the best RNN on the training data (1, 000 nodes and 2 layers) is shown in Fig. 6 and that on the test data can be seen in Fig. 7. For the case of the training data, it can be seen that there is very good fit. With highly dynamic input data, the resulting highly dynamic output information can be captured well.
For the previous unseen test data, this well tuned RNN can show a good fit which captures all the general trend of the output information by given highly dynamic input data. Although the performance of the RNN for system identification of the test data is not perfect, it shows very promising result when implemented in MPC for closed-loop control purpose.
The closed-loop results for 2 layers are shown in the following discussions 3 .

| Closed Loop RNN-MPC Performance
With reference to Fig. 2, for both cases, the objective is to have the system converge to a target set-point that corresponds to an operating point where the ratio of R concentration against that of A is at its maximum, thereby optimizing yield and potential downstream separation costs. It is noted that Linear MPC, based on a single linear model, will not be able to perform well [29], thereby necessitating the selection of a non-linear MPC solution.
3 Code implementation is performed using the Python platform (ver 3.6.5) with Tensorflow (ver 1.7.0), equipped with a Keras (ver 2. The elements of Q y are selected to reflect that tracking C R is more important than tracking C A . The Q u penalty on excessive actuator movement ∆u is higher than that on the state variables so as to avoid over-aggressive controller actions, which may lead to system instability. For digital implementation, the sampling rate is ∆t = 0. In the following studies, we perform closed-loop controller performance benchmarking. This is done against a NMPC controller where the controller's internal model is the same as the actual, full plant model (as in Eq. 3). The optimizer / solver deployed for both the RNN-MPC and benchmark NMPC methods remain the same as the one mentioned in the Section 3.2.
In comparing controller performance, the index used is naturally, the sum of the stage-wise MPC cost added over the total number of discrete-time steps, N , across the entire simulation horizon: where Q y , Q u have already been defined in Eq.6 for the benchmark NMPC. The sequence of controller actions in Eq.11 are determined per the corresponding controllers. Specifically, J [RNN] refers to the corresponding total stage-wise cost for the RNN-MPC solution and J * refers to that for the benchmark NMPC.
For both scenarios, the system is allowed to run for 1.0 time units (equivalent to the prediction horizon) before the controller is switched on. This is because the RNN-MPC is required to store in its memory a history of p number of manipulated input variables. This suggests a certain robustness associated with this combination. Table 4 summarizes a quantification of RNN-MPC performance by way of the closed loop performance index, averaged over both scenarios, for ease of comparison.
In the case of 250 nodes (see Fig. 8), it can be seen that a significant offset occurs at steady-state for both scenarios even though initial transient performance is somewhat similar to that of the benchmark NMPC solution. 1, 000 nodes yielded the best closed-loop performance as evident in Fig. 9. This is consistent with the previous observations made during system identification. Steady-state tracking with no offset occurs for both scenarios. While the closed-loop trajectory of the RNN-MPC solution is close to that of the benchmark solution and has same total stage-wise cost as the benchmark NMPC solution. It is noted that in scenario 1, the transient performance is slightly more rapid than that of the benchmark. This is because the actuator movement, as informed by the SQP optimizer, is more aggressive (though still within the imposed constraints). In the case of 2, 000 nodes (see Appendix Fig. 11), closed loop performance worsens as a result of over-fitting. We summarize the aforementioned discussion in

| CONCLUSION & FUTURE RESEARCH
We considered an exemplary problem for continuous pharmaceutical manufacturing by studying a single, multi-input multi-output CSTR example (per [28], [27]) which experiences input multiplicity due to reversible kinetics (A ↔ R ↔ S ).
This set-point is close to a point of inflexion where the system gain changes in sign with respect to reactor temperature.
We show how RNNs can be learned and present associated closed-loop performance results for two scenarios that require the RNN-based NMPC to move from either side of the inflexion point towards the desired set-point. As a result of input multiplicity, it is noted that a single linear controller will not work well for the problem of concern. While favorable results of the RNN-MPC controller are obtained when compared against a NMPC benchmark that uses the true plant model for control in terms of closed-loop performance.
Future research involves extending the methodology to multiple CSTRs and to reaction kinetics of increasing complexity and immediate relevance to the pharmaceutical industry.

A C K N O W L E D G E M E N T S
We would like to acknowledge Associate Professor Saif A. Khan in the Department of Chemical and Biomolecular Engineering, National University of Singapore for his input related to reaction kinetics and multiple discussions. Also, our thanks to the Pharma Innovation Programme Singapore (PIPS), the source of motivation for this work.

C O N FL I C T O F I N T E R E S T
We declare no conflict of interest.

R E F E R E N C E S
[

| Long Short-Term Memory Cells
In the conventional RNN structure, back-propagation through time results in the gradient signal being multiplied numerous times viz-a-viz the weights corresponding to the various connections (edges). Depending on the eigenvalue of the associated weighting matrix, this leads to the either vanishing (eigen-value less than unity) or exploding (eigen-value greater than unity) gradients. This has the potential to severely impact the learning quality.
The machine-learning community has been using LSTM cells to replace the conventional RNN cell in order to mitigate this issue. These LSTM memory cell uses several gating functions that, in conjunction, serve to adjust / modulate the propagation of signals between cells (e.g., through 'forgetting' , 'input' , 'output' gates) to avoid said numerical issue.
The basic LSTM cell structure is as follows: C k = f k * C k −1 + i k * C k (14) where k is the time index, h k the hidden state variable, u k the manipulated / input variable. f k ∈ [0, 1], i k ∈ [0, 1] and o k ∈ [0, 1] are termed the 'forgetting' , 'input' and 'output' gates respectively. The input gate (Eq.16) controls the degree to which the state of the memory cell is affected by the candidate information (Eq.15); the output gate (Eq.18) controls how the state of the cell affects other neurons; the forget gate (Eq.17) modulates the cell's self-recurrent connection, allowing the cell to (partially) remember the previous state, similar to traditional RNNs.
These work to combine old (C t −1 ) and new candidate information (C t ), respectively, to be passed to subsequent LSTM cells. σ is an activation function that returns a value between 0 and 1. * refers to a point-wise multiplication.