Machine Learning-Based Model Predictive Control of Two-Time-Scale Systems

: In this study, we present a general form of nonlinear two-time-scale systems, where singular perturbation analysis is used to separate the dynamics of the slow and fast subsystems. Machine learning techniques are utilized to approximate the dynamics of both subsystems. Speciﬁcally, a recurrent neural network (RNN) and a feedforward neural network (FNN) are used to predict the slow and fast state vectors, respectively. Moreover, we investigate the generalization error bounds for these machine learning models approximating the dynamics of two-time-scale systems. Next, under the assumption that the fast states are asymptotically stable, our focus shifts toward designing a Lyapunov-based model predictive control (LMPC) scheme that exclusively employs the RNN to predict the dynamics of the slow states. Additionally, we derive sufﬁcient conditions to guarantee the closed-loop stability of the system under the sample-and-hold implementation of the controller. A nonlinear chemical process example is used to demonstrate the theory. In particular, two RNN models are constructed: one to model the full two-time-scale system and the other to predict solely the slow state vector. Both models are integrated within the LMPC scheme, and we compare their closed-loop performance while assessing the computational time required to execute the LMPC optimization problem.


Introduction
Various applications in the field of chemical engineering involve systems that exhibit different time scales.Instances of such systems include biochemical processes, catalytic reactors, and distillation columns.Furthermore, other scientific sectors, such as power electronics, communication networks, and biological systems, also feature systems with dynamics evolving in disparate time scales.Specifically, in the context of chemical engineering, researchers have utilized the method of singular perturbation to decouple two-time-scale systems into reduced-order subsystems, each associated with a distinct time scale (e.g., [1]).This approach simplifies the analysis of the ordinary differential equations governing the system, allowing for the design of a suitable well-conditioned control law that stabilizes the system.In the context of control system design, model predictive control (MPC) is widely recognized as one of the leading and most convenient approaches employed in stabilizing different types of nonlinear systems.MPC is essentially an optimization problem formulated with a well-defined objective function aimed at enhancing the performance of the system while meeting constraints associated with the system's physical structure and closed-loop stability.The fact that MPC accounts for multiple inputs, outputs, and constraints within its framework is what makes it a suitable choice for designing control systems to stabilize chemical processes.However, in singularly perturbed systems, the presence of distinct time scales can lead to challenges when employing MPC without accounting for the state evolution in the different time scales.Neglecting this aspect can result in degradation of the closed-loop performance, long time consumption, or, in more critical situations, instability and stiffness in the control system due to issues with the controller's effectiveness and the presence of ill-conditioning [2].
However, if the time-scale multiplicity is accounted for, MPC has proven to be an efficient control strategy when it comes to situations involving time-scale separation.For instance, in [3], the design of a composite control system was presented, which included two MPC designs: one for the slow subsystem and one for the fast subsystem.Using stability analysis and the theory of singular perturbations, the authors investigated and guaranteed the closed-loop stability for the full, nonlinear two-time-scale system under Lyapunov-based tracking MPC.To additionally consider process economics, a similar strategy was proposed in [4], which focused on the design of a composite controller involving two MPC problems.Specifically, for the stability of the fast subsystem (and full two-time-scale system as a result), a Lyapunov-based tracking MPC was used for the fast states based on the error between the fast states and the slow manifold using a quadratic cost function, while a Lyapunov-based economic MPC was used for the slow subsystem to optimize the economic considerations and achieve any other desirable closed-loop stability properties using a non-quadratic, economic cost function.The authors of [3,4] focused on first-principles-based controllers, and in [5], a two-time-scale system was separated into slow and fast subsystems using a singular perturbation strategy and then modeled using only data from the system via the method of sparse identification for nonlinear dynamics.Subsequently, an MPC was designed based on the reduced-order slow subsystem modeled using sparse identification, and closed-loop stability guarantees were derived.Furthermore, numerical simulations were used to demonstrate the reduced computational cost of the reduced-order sparse-identified model.
Due to the vast advancements in the chemical industry and the multitude of complex reactions occurring in real-life chemical process applications, engineers encounter challenges when constructing first-principles models for these chemical processes.As a result, in recent years, machine learning (ML) models have been utilized to approximate the dynamics of chemical processes.These models are highly beneficial when designed carefully, as they offer a reliable and efficient alternative to classical first-principles models and can then be incorporated into MPC frameworks.Many works have been conducted in this field.For instance, the authors of [6] offered essential insights and a fundamental theory regarding the integration of machine learning techniques into MPC schemes.Their work focused on designing a recurrent neural network (RNN) model to approximate the nominal nonlinear system and integrated the model within an MPC framework to stabilize the system.In [7], polynomial nonlinear autoregressive with exogenous inputs (NARX) models were used to build nonlinear MPC, with the relevant polynomial terms to retain selected via sparse regression.The proposed MPC was applied to a multi-input multi-output chemical reactor, and an algebraic modeling language was used to reduce computational time.
Another research direction termed "approximate MPC" involves the replacement of the MPC optimization problem with a closed-form expression that approximates the MPC control actions without having to solve an optimization problem.This is meant to address the computational challenges inherent in classical MPC at the expense of lower accuracy and generalizability.In [8], an approximate MPC was designed for heating, ventilation, and airconditioning (HVAC) systems due to the stringent computational resources available for building control systems.Specifically, the MPC optimization problem was imitated using a recurrent neural network with a structure of a nonlinear autoregressive network with exogenous inputs trained with data from 30 days of operation in a closed loop under MPC.While both MPC and approximate MPC greatly reduced the cooling consumption of the HVAC system when tested, approximate MPC solved for the control actions over 100 times faster throughout the testing period, with only a 12% degradation in performance compared to MPC.An alternate research direction to reduce the computational burden of MPC is the design of "explicit MPC", in which the optimal control actions are computed offline using multiparametric programming methods, allowing the online component of MPC to only have to search a table of linear gains and compute a single function evaluation.A summary of studies investigating explicit MPC is provided in [9].Machine learning has also been investigated in the context of explicit MPC.In [10], for example, an explicit control law was learned for discrete-time linear time-invariant systems using machine learning.Specifically, the key challenge addressed was that of high dimensionality, which has been attempted to be solved in the existing literature by identifying suboptimal polytope partitions of the state space and designing control laws based on such regions.Ref. [10] focused on tackling high dimensionality in a similar manner but by using reinforcement learning and additionally investigated constraint satisfaction and feasibility guarantees for explicit MPC.Specifically, by using deep neural networks with rectified linear units as the activation function and a modified policy gradient algorithm based on prior knowledge, the objectives were met and demonstrated via three numerical examples of varying levels of dimensionality and complexity.A potential limitation of the approximate and explicit MPC approaches is that they are practically applicable to only smaller-scale systems, even though they are applicable to systems with very fast sampling times [9].However, for general nonlinear systems and nonlinear MPC, due to the highly nonconvex optimization problem, methods such as the use of suboptimal polytopes [10] may not be readily extended to the nonlinear case.
While ML techniques have shown great success when incorporating them into MPC schemes, the practical application of machine learning-based MPC schemes to real-life processes poses significant challenges.The fact that the ML model is developed using a finite number of data samples makes it challenging to assess the accuracy of the model when considering generalized scenarios.Therefore, researchers have developed methods to quantify the error of the process model used in MPC and study its effects on closed-loop performance and stability.For example, ref. [11] proposed the use of Bayesian neural networks (BNN) to quantify the plant-model mismatch and subsequently create adaptive scenarios in real time for scenario-based MPC using the BNN.Based on the simulation results from a cold atmospheric plasma system, the proposed approach outperformed scenario-based MPC without adaptive scenarios and adaptive scenario-based MPC with Gaussian process regression.On more theoretical fronts, researchers have also studied the concept of generalization error bounds, which are crucial to study to ensure the construction of reliable models that are capable of performing accurate predictions on unseen or new data.Specifically, a low generalization error bound guarantees the effectiveness and high performance of the ML model and, consequently, the MPC scheme utilizing the designed ML model.To address this, ref. [12] derived a generalization error bound for a fully connected RNN (FCRNN) model, discussed the main factors that affect the bound, and integrated the FCRNN model into an MPC scheme to stabilize a nonlinear process.Moreover, the authors of [13] focused on deriving a generalization error bound for a feedforward neural network (FNN) that was used to construct a control Lyapunov-barrier function.The authors of [14] studied the generalization error bound for both a partially connected RNN (PCRNN) and an LSTM-RNN and subsequently carried out a comparison study between the generalization performance of an FCRNN and a PCRNN.Additionally, all the aforementioned works incorporated the designed machine learning models into Lyapunov-based MPC schemes to demonstrate the ability of the LMPCs to stabilize a chemical process.To the best of our knowledge, the application of generalization error bounds to neural networks modeling two-time-scale systems has not yet been investigated.
In light of the above considerations, in this article, we use the theory of statistical machine learning to construct the generalization error bounds for neural networks modeling two-time-scale systems.Moreover, the computational time for a neural network-based MPC is known to be an important practical consideration.However, the computational time for an RNN-based MPC with and without decomposition into lower-order subsystems has not yet been studied.Therefore, we introduce two LMPC frameworks, one designed based on an RNN model that predicts the slow state vector and the other designed based on an RNN model that models the full two-time-scale system, and compare the two LMPC schemes in terms of the closed-loop stability and computational time.The rest of this article is structured as follows.Section 2 introduces the essential preliminaries, as well as the general class of two-time-scale systems considered in this work.Section 3 discusses the generalization error bounds of neural networks modeling two-time-scale systems.In Section 4, we present a machine learning-based Lyapunov-based MPC using an RNN that predicts the slow state vector, followed by a closed-loop stability analysis and the corresponding results.In Section 5, a chemical process example is used to demonstrate the effectiveness of the designed controller.Section 6 summarizes the key findings obtained in this study.

Notations
The Euclidean norm of a vector is denoted by | • |.The transpose of the vector x is given by x T .Given a matrix B, its Frobenius norm is denoted as B F .Moreover, the infinity norm of the 1-norms of the columns of B is denoted as is strictly increasing and is zero only when evaluated at zero.
belongs to class KL if, for each constant value of t, the function β(•, t) is of class K, and for each constant value of s, the function β(s, •) is decreasing and approaches zero as s → ∞.The probability that the event A will occur is denoted as P(A).Additionally, the expected value of a random variable X is denoted as E[X].

Class of Systems
The following family of ordinary differential equations describes the general class of two-time-scale continuous-time nonlinear systems with k states considered in this work: where x ∈ R n is the slow state vector, z ∈ R p is the fast state vector, and n + p = k.u ∈ R q 1 is the bounded manipulated input vector.The input vector u is constrained by u ∈ U := {|u i | ≤ u 1 max,i , i = 1, ..., q 1 }.We assume the vector functions f 1 (x, z, u, ) and f 2 (x, z, ) to be sufficiently smooth vector functions in R n and R p , respectively.Moreover, we assume that the closed-loop stability region of the nonlinear system in Equation ( 1) is defined by the region Ω ρ F , where D is an open neighborhood around the origin.The speed ratio of the slow to the fast dynamics of the system is represented by the small positive parameter .In Equation (1b), we observe that the speed ratio pre-multiplies the derivative of the fast state vector z, which enables us to utilize a well-known strategy called "reduced-order modeling" via singular perturbations.In this approach, we will be able to decompose the system in Equation ( 1) into two different reduced-order subsystems.We follow the strategy thoroughly illustrated in [2].
The following equations are obtained by setting = 0 in Equation ( 1): where x and z are the slow state vector and the fast state vector associated with the system in Equation (1) under the case where = 0, respectively.Additionally, the following assumption is considered to be fundamental in the theory of singular perturbations.
Assumption 1. Equation (2b) has an isolated solution, which is determined by where z is a quasi-steady state for the fast state z, and the function f2 : R n → R p is continuously differentiable.
We substitute Equation (3) into Equation (2a) to obtain the reduced-order slow subsystem, Furthermore, to obtain the dynamics of the reduced-order fast subsystem, we introduce a fast timescale τ = t/ and a new coordinate z = z − z to re-write Equation (1b) with respect to the newly introduced variables z and τ.Then, the fast subsystem can be expressed as the derivative of z with respect to τ while setting = 0, Starting at t 0 , the initial conditions of the state vectors x and z are given by the vectors x 0 and z 0 , respectively.Based on Equations ( 1)-( 5), and by utilizing the basic theoretical concepts of two-time-scale systems in [2], the following equation will hold for all t ∈ [t p , T], where where z(t) reflects the slow transient of z, and O( ) is an error of order epsilon.A variable, z(t), is O( ), where is a positive constant if there exists a positive constant k (independent of ), such that |z(t)| ≤ k .Furthermore, if we were to consider the time interval where t ∈ [t 0 , T], Equation (6) would be slightly modified, such that the approximation of the state z is given by the following: where z(t) and z(t) are the slow and fast transients of z, respectively.Additionally, the approximation of state x by x is given by the following equation for all t ∈ [t 0 , T] The fast subsystem in Equation ( 5) needs to satisfy certain stability properties for the above closeness of the solution estimates to hold true, which are described by the following assumption.
If Assumption 2 holds true, then This implies that, at some time t p > t 0 , z will approach close to its quasi-steady state z.The local stability of the equilibrium of the fast subsystem holds if the following verifiable condition holds.

Assumption 3. All the eigenvalues of
∂z , computed for = 0, along x(t), z(t), exhibit real parts less than a constant negative value, i.e., Utilizing the aforementioned assumptions, we establish the well-known "Tikhonov's theorem" in the following theorem.
Theorem 1 (c.f.Theorem 3.1 in [2]).Consider that Assumptions 2 and 3 hold true.Then, for all t ∈ [t 0 , T], Equations ( 7) and ( 8) are valid.In addition, there exists a specific time instant t p ≥ t 0 , such that Equation ( 6) is valid for all t ∈ [t p , T].
Several sources such as [15] are useful for analyzing slightly varied formulations of Theorem 1 and reviewing its proof.At this point, we have investigated several essential concepts of two-time-scale systems.We will assume that the error between x and x is not greater than O( ).This assumption allows us to simplify the analysis by approximating x as x.Similarly, the same principle applies to z and z, with our focus on capturing the slow transient of the fast state dynamics z, that is, z.Therefore, based on this assumption, we express the reduced-order slow subsystem in Equation (4) as the following throughout this manuscript, ẋ = F(x, u) where f (•) and g(•) are sufficiently smooth vector functions of dimensions n × 1 and n × q 1 , respectively.Without loss of generality, we assume that the initial time t 0 = 0 and that f (0) = 0; hence, the steady state of the nonlinear system in Equation ( 11) is the origin.Finally, we will assume that the fast subsystem is globally asymptotically stable, which is necessary for establishing the stability of the closed-loop system under model predictive control using machine learning models in Section 4.
Assumption 4. The origin of the closed-loop fast subsystem described by Equation ( 5) exhibits global asymptotic stability uniformly in x.This implies that there exists a function β z of class KL, such that for any z(0

Stabilizability Assumption via Control Lyapunov Function
Regarding the dynamics of the slow subsystem described in Equation ( 11), we assume that there exists a locally Lipschitz feedback controller, Φ(x) ∈ U, which renders the origin of the slow subsystem in Equation (11) asymptotically stable, i.e., there exists a C 1 Lyapunov function V(x) that is continuously differentiable and meets the following set of inequalities: where a 1 , a 2 , a 3 , and a 4 are class K functions for all x ∈ R n ⊂ D. Additionally, given that the system F(x, u) has a Lipschitz property and the input u is constrained by the set U, then there exist positive constants M, L x , and L x , such that the subsequent inequalities hold for all x, x ∈ D, and u ∈ U: Moreover, the region Ω ρ := {x ∈ D|V(x) ≤ ρ}, ρ > 0, is defined as the stability region for the slow subsystem in Equation ( 11).Now we discuss the incorporation of machine learning models, particularly neural networks, in process systems engineering.The first step in developing any type of neural network model is generating a comprehensive data set that effectively captures the inputoutput relation.This data set is used to train the network and enables it to learn patterns present in the data and make accurate predictions.To generate a data set that captures the dynamics of the nonlinear system in Equation (1) within the region Ω ρ , we follow the data generation technique described in [6].The first step is to carry out various openloop simulations of the nonlinear system in Equation ( 1), covering a wide range of initial conditions (i.e., x 0 ∈ Ω ρ and z 0 ∈ Ω ρ ) and valid input signals under sample-and-hold implementation (i.e., the input is applied to the system as piecewise constant functions, u = u(t k ) ∈ U, ∀t ∈ [t k , t k+1 ), where t k+1 := t k + ∆, ∆ is the sampling period).To integrate the nonlinear system in Equation ( 1), the well-established forward Euler approach is used with a sufficiently small integration time step h c ∆.As a result, an extensive data set containing the dynamics of the system is generated, which is then utilized to train a neural network employing the Keras library.

Recurrent Neural Networks
Consider designing a recurrent neural network (RNN) model that predicts only the slow state vector x in Equation (1), or in other words, to approximate the dynamics of the slow subsystem in Equation (11).The network is trained using m data points (x i,t , y i,t ) of T-time length, where i = 1, ..., m and t = 1, ..., T. x i,t ∈ R d x and y i,t ∈ R d y are the RNN input and output, respectively.Additionally, d x and d y are the dimensions of the RNN input and output, respectively.It is worth mentioning that the notations used for the RNN inputs and outputs are represented in bold to establish a clear distinction between the RNN notations and the notations used to describe the nonlinear two-time-scale system in Equation (1).The RNN input x i,t comprises the current slow state measurement, along with the manipulated inputs applied through the time steps t = 1, ..., T, where the manipulated inputs are generated randomly within the set U. The RNN output y i,t comprises the slow state predicted through the time steps t = 1, ..., T. As a result, the RNN model is designed to make predictions of the slow states over one sampling period with T = ∆ h c time steps.The data set utilized to develop the RNN model is constructed by generating m independent data sequences obtained from an underlying distribution over R d x ×T × R d y ×T .
With the aim of simplification, we introduce a single-hidden-layer RNN model, given that h i ∈ R d h are the hidden states and can be evaluated as follows: where the element-wise nonlinear activation function is denoted by σ h .U ∈ R d h ×d h is the weight matrix associated with the hidden states, whereas W ∈ R d h ×d x is the weight matrix associated with the input vector.Furthermore, the following equation evaluates the output layer y i,t of the RNN model: where the element-wise activation function of the output layer is denoted by σ y , and V ∈ R d y ×d h is the weight matrix associated with the output layer.In this particular appli-cation, the input of the RNN model will be the current slow state measurement, x ∈ R n , as well as the manipulated input u ∈ R q 1 for the next sampling period.The output of the RNN model will be the predicted slow states x for at least one sampling period ahead.
The basic structure of the RNN is represented in Figure 1.Let y be the predicted value and y be the actual value.We can then define the loss function L(y, y) that computes the squared difference between the actual and predicted values.More precisely, we will consider the L 2 (i.e., mean squared error) loss function.Without loss of generality, we establish the following standard assumptions that are required for the study of the generalization error bounds and are reviewed for completeness: Assumption 5.The inputs of the RNN are bounded (i.e., x i,t ≤ B X for all i = 1, ..., m and t = 1, ..., T).Assumption 6.The Frobenius norms of the weight matrices are bounded (i.e., V Assumption 7. σ h is a positive homogeneous and 1-Lipschitz continuous nonlinear activation function (i.e., σ h (αz) = ασ h (z) holds ∀α ≥ 0 and z ∈ R).
Assumption 8.The data sets used for training, testing, and validation are drawn from the same distribution.

Remark 1.
Although the RNN model is developed to approximate the dynamics of the slow subsystem described by Equation ( 11), we emphasize the fact that the data used to train this RNN model are constructed using the nonlinear two-time-scale system in Equation ( 1).The reason is that in most practical scenarios, engineers are limited to working with data obtained from the original nonlinear two-time-scale system in Equation ( 1).Additionally, it is possible to design an RNN model that predicts the full two-time-scale dynamics of Equation (1) using full-state measurements.Remark 2. Assumption 5 takes into account the bounded nature of the RNN inputs.This aligns with the observation that the states x and the inputs u are restricted within certain limits, where x ∈ Ω ρ and u ∈ U. Assumption 6 requires that the weight matrices of the RNN are bounded.This requirement can be met while training the RNN since the search for the optimal RNN parameters is limited to a finite class of neural network hypotheses.Assumption 8 indicates that the RNN model constructed using data derived from industrial operations will be utilized in the exact same process under the condition that the data distribution remains unchanged.It is important to point out that in the context of assessing the generalization performance for machine learning models, Assumption 8 is regarded as essential.Several well-known activation functions can be used to construct the RNN model that satisfy Assumption 7; the rectified linear unit (ReLu) activation function is one such example.

Feedforward Neural Networks
Using the RNN model described in Section 2.4, we are able to predict the slow state vector x.However, it is important to devise a methodology to predict the values of the fast states.In line with the framework proposed in [5], we consider the design of a feedforward neural network (FNN) that predicts the fast states using the previously predicted slow states from the RNN model.More precisely, the FNN input will be the slow states x, and its output will be the fast states z.The FNN model is trained using m independent data points from an underlying distribution over R d F x × R d F y , where d F x and d F y are the dimensions of the FNN input and output, respectively.In this particular application, given that the FNN input is the slow state, x ∈ R n , and its output is the fast state z ∈ R p , we have d F x = n and d F y = p.The general form of an FNN model can be formulated as follows: where d is the total number of FNN layers, y F ∈ R d F y is the predicted output of the FNN, and x is the FNN's input.For each FNN layer l, where l = 1, . . ., d, the weight parameter matrix is denoted as Q l , and the activation function is represented as σ l .The depth of the network is represented by the number of layers d.However, the width of the network is the maximal number of neurons in a hidden layer and is represented by h max .Given that h l denotes the number of neurons in the l th layer, the width of the network can be expressed as h max = max l=1,...,d {h l }.The general FNN structure is shown in Figure 2. If yF is the predicted value and y F is the actual value, we can define the loss function L(y F , yF ) that computes the squared difference between the actual and the predicted values.More precisely, we will consider the L 2 (i.e., mean squared error) loss function.Similar to the discussion on RNNs, we establish the following assumptions for the developed FNN model: Assumption 9.The inputs of the FNN are bounded (i.e., x F i ≤ B X for all i = 1, ..., m).
Assumption 10.The norms of the weight matrices are bounded in the sense that the maximal 1-norm of the rows of the weight matrices in the hidden layers and output is bounded Assumption 11.The activation function σ l is 1-Lipschitz continuous and satisfies σ l (0) = 0, where l = 1, ..., d (e.g., tanh(•)).
Assumption 9 is an implication of the assumption that the RNN output (slow states) is bounded.Assumptions 9-11 follow the same reasoning as Assumptions 5-7, as explained in Remark 2. Building upon the previous assumptions, it should be noted that Assumption 8 remains applicable to the FNN model.

Generalization Error Bounds of Neural Networks Modeling Two-Time-Scale Systems
The primary goal of constructing any neural network model is to make accurate predictions.Therefore, analyzing the generalization error bounds for various types of neural networks modeling different types of nonlinear systems enables us to improve the construction and design of these network models.Typically, the assessment of neural networks is conducted using a finite set of training samples.Hence, it is essential to investigate the generalization error bound for the neural network model, which allows us to evaluate the performance of the model in accurately predicting outcomes for new, unseen data.In other words, considering new data from the same distribution, the generalization error bound quantifies the ability of a neural network model to make accurate predictions for unseen data that the neural network has not encountered before.
In this work, we consider an RNN that predicts the dynamics of the slow states.Then, an FNN is utilized to predict the fast states using the predicted slow states as the input for the FNN.The generalization error bounds of the neural networks modeling two-time-scale systems are investigated.Particularly, in this section, we discuss the generalization error bounds of both the RNN (which predicts the slow states' dynamics) and the FNN (which predicts the fast states).The generalization error bound is derived by utilizing the basic concepts of statistical machine learning theory.Hence, in the upcoming subsection, we outline the fundamental concepts and definitions used to derive the generalization error bounds.Additionally, it is worth mentioning that in the following subsection, the same principles that apply to x,y, and y also extend to x F ,y F , and yF , respectively.In the same vein, the principles that apply to d x and d y are also true for d F x and d F y , respectively.Let H be the hypothesis class of RNN functions h(•) that map the d x -dimensional input x ∈ R d x to the d y -dimensional output y ∈ R d y .Similarly, let H F be the hypothesis class of FNN functions h F (•) that map the d x -dimensional input x F ∈ R d x to the d y -dimensional output yF ∈ R d y .We also note that in the following subsection, the principles that apply to H and h(•) are also valid for H F and h F (•), respectively.It should be noted that the dimensions d x and d y differ based on whether the network is an RNN or an FNN.

Generalization Error Bound Preliminaries
Definition 1.Consider a function, h, which makes predictions of the output y corresponding to the input x, along with an underlying distribution D. The generalization error or the expected loss is formulated as follows: where ρ(x, y) represents the joint probability distribution for x and y, whereas X and Y represent the vector space for all possible inputs and outputs, respectively, and L denotes the loss function.
We introduce the following definition of the empirical error as an approximation of the expected loss due to the fact that the joint probability distribution ρ is unknown in many scenarios.
Definition 2. Considering a data set consisting of m data samples S = (s 1 , ..., s m ), with each s i = (x i , y i ), the empirical error or risk can be expressed as We note that the m data samples are gathered from the same data distribution.In this work, the loss function L(y, y) is chosen as the mean squared error (i.e., L 2 loss function).Furthermore, the generalization error bounds are derived using "Rademacher complexity", a widely recognized concept in the field of machine learning theory that is used to determine the complexity and richness of a class of functions.The Rademacher complexity is defined as follows: Definition 3. Given a data set S = {s 1 , ..., s m } with m samples, and a hypothesis class F of real-valued functions, the empirical Rademacher complexity of F can be defined as follows: where = ( 1 , ..., m ) T consists of the Rademacher random variables i .These variables are independent and identically distributed (i.i.d.), satisfying the condition that P( i = −1) Given that G t is the class of loss functions associated with the function class H and is defined as follows where x and h(x) are the model's input and output, respectively, whereas y denotes the actual value of the output, the upper bound of the generalization error can be computed using the Rademacher complexity R S (G t ), as shown in the following Lemma.
Lemma 1 (c.f.Theorem 3.3 in [16]).Given a data set S = (s i ), i = 1, ..., m, that consists of m i.i.d.data samples, where s i = (x i,t , y i,t ) T t=1 for an RNN and s i = (x i , y i ) for an FNN, with probability 1 − δ, the following inequality holds for all g t ∈ G t over the data samples S: For a comprehensive proof of Lemma 1, interested readers can refer to [12,16].By observing Equation (22), clearly, the upper bound of the generalization error relies on three terms.The first term is the empirical loss ( 1 m ∑ m i=1 g t (x i , y i )).The second term is the Rademacher complexity (2R S (G t )), whereas the third term depends on the data set size m, along with the confidence level δ.At this stage, the goal is to be able to effectively quantify the generalization error bound using predefined and measurable values.This can be achieved by imposing a further upper bound on the Rademacher complexity term.Having established the background and basic concepts of the generalization error bound, our next discussion focuses on examining the generalization error bounds for neural networks modeling two-time-scale systems.

RNN Generalization Error Bound
In this subsection, we investigate the generalization error bound for the RNN model that predicts the slow dynamics of the two-time-scale system defined in Equation (1).We start by introducing the following Lemma, which upper-bounds the Rademacher complexity in terms of the RNN parameters.Lemma 2. Let H k,t , k = 1, ..., d y represent the class of real-valued functions associated with the k th component of the RNN output at the t th time step, with the activation functions and weight matrices satisfying Assumptions 5-8.Given a data set S = (x i,t , y i,t ) T t=1 , i = 1, ..., m, consisting of m i.i.d.data samples, the following inequality holds for the Rademacher complexity: where , and B X is the upper bound for the RNN inputs.
For a comprehensive and detailed proof of Lemma 2, interested readers may refer to [12].By applying the results derived and proven in [12], the following Lemma specifically addresses the generalization error bound for the RNN model that predicts the slow dynamics of the two-time-scale system defined in Equation (1).Lemma 3. Consider the general class of the two-time-scale continuous-time nonlinear systems described in Equation ( 1) under the assumption that the slow and fast states in Equation ( 1) are stable and the perturbation parameter is sufficiently small.An RNN satisfying Assumptions 5-8 is constructed to predict the slow states of the system at the t th time step.Given the L r -Lipschitz loss function that belongs to the family of loss functions G t associated with the RNN function of class H t and a data set S = (x i,t , y i,t ) T t=1 , i = 1, ..., m, with m i.i.d.data samples, the following inequality holds true with a probability of at least 1 − δ over S for all t ∈ [t 0 , T]: where , and B X is the upper bound for the RNN inputs.
Equation (24) indicates that the generalization error bound of the RNN that predicts the slow state vector x depends on a number of factors, such as the number of training samples m, the time length t of the RNN's inputs, the upper bound on the input vector B X , and the complexity hypothesis class in terms of the weight matrices M, as well as the empirical loss ( 1 m ∑ m i=1 g t ( xi , x i )).We emphasize that the actual slow state is denoted as x, and the slow state predicted by the RNN model is x.Additionally, based on Equation ( 8), the generalization error bound in Equation ( 24) holds for all t ∈ [t 0 , T] under the assumption that the slow and fast states are stable, and the perturbation parameter is sufficiently small.Now we discuss the implementation of the RNN's generalization error bound for a specific loss function.A locally Lipschitz continuous loss function is used to optimize the RNN's weights and biases.To be more precise, we employ the MSE loss function of the form, where y and y are the predicted and actual values, respectively.Since the loss function L is locally Lipschitz continuous, it satisfies the following inequality, where L r is the local Lipschitz constant for the loss function L. Since the RNN model predicts the slow states of Equation ( 1), the loss function is computed over the actual and predicted slow states, x and x, respectively.Moreover, the expected loss of L is upperbounded by the following inequality, with a probability of at least 1 − δ: Given that the MSE loss function L computes the error between the RNN's output x and the RNN's predicted output x, the upper bound on | x − x| can be written as follows:

FNN Generalization Error Bound
In this subsection, we investigate the generalization error bound for the FNN model that predicts the fast states of the two-time-scale system defined in Equation (1).We start by introducing the following Lemma, which upper-bounds the Rademacher complexity in terms of the FNN parameters.
Lemma 4 (c.f.Theorem 2 in [17]).Given a class of scalar-valued functions H F k and a neural network with depth d, where Q 1 1,∞ ≤ B Q for all l = 1, ..., d and Assumptions 8-11 are satisfied, the following inequality holds for the Rademacher complexity: For a comprehensive and detailed proof of Lemma 4, interested readers may refer to [17].By applying the results derived and proven in [13], the following Lemma specifically addresses the generalization error bound for the FNN model that predicts the fast states using the slow states of the two-time-scale system defined in Equation (1).Lemma 5. Consider the general class of two-time-scale continuous-time nonlinear systems described in Equation (1) under the assumption that the slow and fast states are stable, and the perturbation parameter is sufficiently small.An FNN satisfying Assumptions 8-11 is constructed to predict the fast states using the slow states.Given the L r -Lipschitz loss functions associated with the vector-valued FNN hypothesis class H F and a data set S consisting of m i.i.d.data samples, the following inequality holds true, with a probability of at least 1 − δ over S for all t ∈ [t p , T], where t p is defined in Theorem 1: Equation ( 30) indicates that the generalization error bound of the FNN modeling the fast subsystem depends on several factors, such as the number of training samples m, the upper bound on the input vector B X , the complexity hypothesis class in terms of the upper bound of the weight matrices B Q , and the number of layers d, as well as the empirical loss ( 1 m ∑ m i=1 L( zi , z i )).We emphasize that the actual fast state is denoted as z, and the fast state predicted by the FNN model is z.Therefore, the generalization error bound of Equation ( 30) holds for all t ∈ [t p , T] under the assumption that the slow and fast states are stable, and the perturbation parameter is sufficiently small.Now we discuss the implementation of the FNN's generalization error bound specifically on the MSE loss function.A locally Lipschitz continuous loss function such as the MSE is used to optimize the FNN's weights and biases.The MSE loss function satisfies the inequality of Equation (26).Since the FNN model predicts the fast states of Equation (1), the loss function is computed over the actual and predicted fast states, z and z, respectively.Moreover, the expected loss of L is upper-bounded by the following inequality, with a probability of at least 1 − δ: Given that the MSE loss function L computes the error between the RNN's output z and the RNN's predicted output z, the upper bound on | z − z| can be written as follows:

Machine Learning-Based LMPC Using an RNN That Approximates the Slow Subsystem
Having investigated the generalization error bounds for neural networks modeling two-time-scale systems, the rest of this manuscript focuses on the design and performance of LMPC using RNN models with and without time-scale decomposition.Specifically, the objective is to design an RNN to predict only the slow state vector x, following the approach outlined in Section 2.4, and then incorporate the designed RNN model within an LMPC scheme to show that it is sufficient to stabilize the full two-time-scale system in Equation (1).In this section, we present a stability analysis for the proposed LMPC framework.For simplicity, in this section, we present the RNN model that predicts the slow state vector x as a continuous-time nonlinear system of the following form: where x ∈ R n is the RNN's state vector and u ∈ R q 1 is the manipulated input vector.The weight matrices are denoted as A and Θ.The diagonal coefficient matrix A has negative diagonal entries.Θ is defined as Θ = [θ 1 , ..., θ n ] ∈ R (q 1 +n)×n , with entries θ i = b i [w i1 , ..., w i(q 1 +n) ], i = 1, ..., n, given that w ij represents the weight associated with the connection between the ith neuron and the jth output, with i ranging from 1 to n and j ranging from 1 to (q 1 + n).The vector ẑ = [ ẑ1 , ..., ẑn , ẑn+1 , ..., ẑn+q 1 ] = [σ( x1 )...σ( xn ), u...u q 1 ] ∈ R n+q 1 consists of both the network states x and inputs u.Additionally, a nonlinear acti- vation function σ(•) is applied to the network states x when constructing the vector ẑ.It is worth mentioning that the weight matrices and activation functions satisfy Assumptions 5-8.For simplicity, we assume a single-hidden-layer RNN model represented by Equation (33), which does not explicitly include bias terms.Nevertheless, it should be emphasized that the results obtained in this section are not limited to single-hidden-layer RNN models, but can be generalized to include deep RNN models that consist of multiple hidden layers.

Lyapunov-Based Control Using an RNN Model
Taking into account the RNN model designed to predict the values of the slow states, we assume that there exists a stabilizing control law Φ nn (x) ∈ U (e.g., a P-controller, a Lyapunov-based controller) that can render the origin of the RNN model in Equation (33) asymptotically stable in an open neighborhood around the origin denoted as D. Hence, there exists a continuously differentiable Lyapunov function V : R n → R ≥0 that satisfies the following inequalities: where âi are class K functions for all x ∈ R n ⊂ D. The Lyapunov function V can be designed using a quadratic formulation, which satisfies the required criteria, or, alternately, using learning-based approaches, as conducted using a linear parameter-varying (LPV) framework in [18].In the simulation example presented in our work below, a quadratic Lyapunov function is used, and extensive open-loop and closed-loop simulations are conducted to verify that the asymptotic stability assumption is met.The Lyapunov level set that ensures the stability of the RNN model is defined as ≤ ρ}, where ρ > 0. This implies that the closed-loop stability region of the RNN model in Equation ( 33) is denoted as Ω ρ.Moreover, there exist positive constants M nn and L nn , such that the following inequalities are satisfied for all x, x ∈ Ω ρ, and u ∈ U: The feedback controller u = Φ nn (x) ∈ U can stabilize the dynamics of the slow subsystem in Equation ( 11) under a sufficiently small modeling error between the nominal slow subsystem and the RNN model.This result is shown in the following proposition.
Proposition 1.Consider the RNN model in Equation (33) that satisfies the stabilizability conditions of Equation ( 34) and is rendered asymptotically stable around the origin under the control law u = Φ nn (x) ∈ U for all x ∈ Ω ρ.Then, the origin of the slow subsystem in Equation ( 11) is asymptotically stable if there exists a positive real number ν m , where ν m < â3 (|x|)/ â4 (|x|), such that the modeling error between the slow subsystem in Equation ( 11) and the RNN model in Equation (33) (i.e., ν = |F(x, u) − F nn (x, u)|) is upper-bounded by ν m for all x ∈ Ω ρ.
Proof.We follow the strategy outlined in [5] to prove Proposition 1.The main objective is to show that, under the control law u = Φ nn (x) ∈ U, the slow subsystem in Equation ( 11) is asymptotically stable around the origin for all x ∈ Ω ρ if Φ nn (x) renders the RNN designed to approximate the slow subsystem asymptotically stable with a bounded modeling error between the nominal slow subsystem and the RNN model.In other words, we show that ˙V(x) ≤ 0 for the slow subsystem in Equation ( 11) under the controller u = Φ nn (x) ∈ U based on the RNN model.We utilize the inequalities in Equations (34b) and (34c) and compute ˙V(x) as follows: By assigning ν m < â3 (|x|) â4 (|x|) , we obtain ˙V(x) ≤ − ã3 (|x|) ≤ 0, where ã3 (|x|) = − â3 (|x|) + ν m â4 (|x|) > 0 since â3 and â4 are known functions chosen in such a way as to ensure that the above result holds.To show that if the general class K functions are chosen as â3 = a 3 |x| and â4 = a 4 |x|, where a 3 and a 4 are constants, then ν m = a 3 a 4 . Hence, we guarantee the closed-loop stability of the slow subsystem in Equation ( 11) around the origin under the control law Φ nn (x) ∈ U for all x ∈ Ω ρ.
The RNN model designed in Equation ( 33) is incorporated into an LMPC scheme, where the control actions computed by the LMPC are implemented in a sample-andhold fashion.Moreover, this sample-and-hold implementation of the LMPC control law u = Φ nn (x) ∈ U establishes certain properties.These properties are studied in the following two propositions.Specifically, the following proposition shows that the error between the state of the slow subsystem in Equation ( 11) and the state predicted by the RNN model in Equation ( 33) is bounded.Proposition 2. Consider the RNN model in Equation (33) and the slow subsystem in Equation ( 11), starting with the same initial condition x 0 = x0 ∈ Ω ρ.There exists a class K function f w (•) and a positive constant κ, such that the following inequalities are satisfied for all x, x ∈ Ω ρ: Considering that e(t) = x(t) − x(t) represents the error vector between the state of the slow subsystem in Equation ( 11) and the state of the RNN model in Equation ( 33), the bound for the time derivative of e(t) is given as follows: Using Equation (14b), for all x, x ∈ Ω ρ, the first term in Equation ( 38) can be bounded as follows: The term |F( x, u) − F nn ( x, u)| in Equation (39) denotes the modeling error, and it is upperbounded by ν m for all x ∈ Ω ρ.Hence, the term ė(t) can be further bounded by utilizing the bound of the modeling error and the bound of Equation (39), as follows: Taking into account the zero initial condition (i.e., e(0) = 0), we integrate the inequality in Equation ( 40) and obtain the following upper bound for the error vector for all x, x ∈ Ω ρ: Equation (37b) can be derived using the Taylor series expansion of V(x) around x for all x, x ∈ Ω ρ as follows: where κ is a positive real number.Furthermore, Equations (34a) and (34b) are used to further upper-bound V(x) as follows: The following proposition demonstrates that the closed-loop state of the slow subsystem described in Equation ( 11) is maintained within the stability region Ω ρ at all times.
Additionally, utilizing the Lyapunov-based controller u = Φ nn (x) ∈ U through sampleand-hold implementation ensures that the closed-loop state of the slow subsystem can be ultimately bounded within a small region around the origin Ω ρ min .
Proposition 3 (c.fProposition 3 in [5]).Consider the slow subsystem in Equation ( 11) under the controller Φ nn ( x) ∈ U that satisfies the conditions in Equation (34) and is implemented in a sample-and-hold fashion (i.e., Φ nn ( x(t k )), ∀t ∈ [t k , t k+1 ), where t k+1 := t k + ∆) to stabilize the RNN model in Equation (33).Then, there exist w > 0, ∆ > 0 and ρ > ρ min > ρ nn > ρ s that satisfy and such that, for any x(t k ) ∈ Ω ρ\Ω ρ s , there exists a class KL function β x and a class K function γ, such that the following inequality is satisfied: and the state x(t) of the nominal slow subsystem in Equation ( 11) is bounded in Ω ρ for all times and ultimately bounded in Ω ρ min .
Proof.Assuming that x(t k ) = x(t k ), the first part of establishing this proof involves showing that V( x) is decreasing under the control law u(t) = Φ nn (x(t k )) ∈ U for t ∈ [t k , t k+1 ), where x(t k ) and x(t k ) are the state of the slow subsystem in Equation (11) and the state of the RNN model in Equation (33), respectively.The time derivative of V( x) for all t ∈ [t k , t k+1 ) is calculated as follows: By utilizing the inequalities in Equations (34a) and (34b), we further bound ˙V( x(t)) as follows: By applying the Lipschitz inequalities in Equation (35), we proceed with bounding ˙V( x(t)) as follows: The second part of this proof is to demonstrate that the controller u = Φ nn (x) ∈ U, implemented in a sample-and-hold fashion, effectively bounds the states of the slow subsystem in Equation (11) in Ω ρ and can ultimately derive the states to a small region around the origin.This requires showing that V(x) for the slow subsystem in Equation ( 11) is decreasing under the control law The derivative of V(x(t)) with respect to time is computed as follows: Using the inequality in Equation (36), which holds for all x ∈ Ω ρ\Ω ρ s , the first term in Equation ( 51) can be further bounded as follows: where ã3 (•) was previously defined in the proof of Proposition 1.By using the Lipschitz condition stated in Equation ( 14), we derive the following upper bound for ˙V(x(t)): Therefore, if Equation (44b) holds true, the following inequality is satisfied for all x(t k ) ∈ Ω ρ\Ω ρ s and for all t ∈ [t k , t k+1 ): By integrating the above differential equation over the time interval t ∈ [t k , t k+1 ) between any two arbitrary points within the previous time interval, it can be shown that for all x(t k ) ∈ Ω ρ\Ω ρ S , the following results are obtained: Based on Equation (54), it can be observed that ˙V(x(t)) is negative for all x(t k ) ∈ Ω ρ\Ω ρ S .As a result, the state of the slow subsystem in Equation ( 11) remains continuously bounded within the region Ω ρ and can be ultimately driven toward the origin in each sampling period by implementing the control law u = Φ nn (x).In addition, if x(t k ) ∈ Ω ρ s , the state of the RNN model in Equation ( 33) is bounded within the region Ω ρ nn for one sampling period.This outcome has been demonstrated in the first part of the proof.Taking onto account the bounded modeling error between the state of the RNN, as described in Equation ( 33), and the state of the slow subsystem in Equation ( 11), a compact set Ω ρ min ⊃ Ω ρ nn exists and satisfies Equation (45b).The set Ω ρ min is introduced to ensure that the state of the slow subsystem in Equation ( 11) is bounded within the region Ω ρ min during one sampling period, provided that the state of the RNN represented by Equation ( 33) is bounded within the region Ω ρ nn .In the case where the state x(t) enters the set Ω ρ min \Ω ρ S , it has been shown that Equation ( 56) is satisfied.Therefore, under the control law u = Φ nn (x), the state x(t) will be driven toward the origin in the next sampling period, ultimately bounding the state of the slow subsystem within a small region around the origin Ω ρ min .Hence, considering the continuity property of the Lyapunov function V, it follows that there exists a class KL function β x and a class K function γ, such that if x 0 ∈ Ω ρ, then x(t) ∈ Ω ρ for all t ≤ t 0 :

Machine Learning-Based LMPC Formulation
In this subsection, we introduce the machine learning-based LMPC formulation that utilizes the RNN model designed in Equation (33) to predict future states over a certain future horizon.Furthermore, the LMPC is essentially an optimization problem consisting of an objective function and a number of constraints.This optimization problem is solved repeatedly to compute the optimal input trajectory.The following shows the formulation of the machine learning-based LMPC optimization problem: where the predicted slow state trajectory is denoted as x.The set of piecewise constant functions with period ∆ is denoted as S(∆), and the number of sampling periods in the prediction horizon is denoted by N. The optimal control action u * (t) is calculated by repeatedly solving the machine learning-based LMPC optimization problem over the entire prediction horizon t ∈ [t k , t k+N ).Then, the optimal control action u * (t k ) computed at the first sampling period is transmitted by the controller to be applied to the process.Subsequently, the resulting real-time state, x(t k ), is fed back to the machine learning-based LMPC optimization problem to compute the optimal input trajectory for the next sampling time.The optimization problem aims to minimize the time integral of L MPC ( x(t), u(t)) over the prediction horizon, which is represented in the cost function in Equation (58a).
As for the constraints of the optimization problem, the first constraint demonstrated in Equation ( 58b) is essentially the RNN model utilized to approximate the states of the slow subsystem's dynamics.Equation (58c) represents the input constraints that may be applied to the process over the entire prediction horizon.The initial condition needed to solve Equation (58b) is essentially the slow state measurement at t = t k , which is specified in the constraint in Equation (58d).In the case where x(t k ) ∈ Ω ρ\Ω ρ nn , the constraint Equation (58e) ensures that the closed-loop state converges toward the origin.Furthermore, if the state x(t k ) enters the region Ω ρ nn , it is necessary to guarantee that the states predicted by the RNN model remain bounded within the region Ω ρ nn throughout the entire prediction horizon.This is accomplished by employing Equation (58f).

Closed-Loop Stability
Assuming that certain conditions are met, the following theorem establishes the closedloop stability of a singularly perturbed system described by Equation ( 1) when the machine learning-based LMPC of Equation ( 58) is implemented.
Theorem 2. Consider the singularly perturbed system in Equation ( 1) in a closed loop, with the optimal control action u * calculated using the machine learning-based LMPC in Equation ( 58) based on the controller Φ nn (x) that meets the conditions of Equation (34).Additionally, assuming that Propositions 1-3 and Assumptions 1 and 4 hold true, then there exist class KL functions β x and β z , a pair of positive real numbers (δ, d), and * > 0, such that if max{|x(0), |z(0)|} ≤ δ and ∈ (0, * ], then, for all t ≥ 0, Proof.We follow the proof analysis in [5].Moreover, we consider that by substituting the optimal control action u * into Equation ( 1), the closed-loop system can be represented as follows: We set = 0 and obtain the following, where the assumption that the error between z and z is not greater than O( ) enables us to simplify the analysis by approximating z as z.By solving Equation (62b) for z, we obtain a unique, isolated root z = f (x).We substitute the root into Equation (62a) and obtain the following for the reduced-order slow subsystem, By observing the LMPC equation in Equation ( 58), in the case where x(t k ) ∈ Ω ρ\Ω ρ nn , Equation (58e) is activated, such that the Lyapunov function V under the calculated control law u is decreased.According to the finding in Proposition 3 and considering a sufficiently small modeling error, the state of the slow subsystem in Equation (61a) will gradually converge to the origin and enter the region Ω ρ nn within a finite number of sampling time steps.Equation (58f) is activated once the state x(t k ) enters the region Ω ρ nn , where it ensures that the predicted state is bounded within Ω ρ nn throughout the prediction horizon.Additionally, by ensuring that the modeling error and the sampling time are sufficiently small, Proposition 3 establishes that the actual state in Equation (61a) can be maintained within a slightly expanded set Ω ρ min , encompassing Ω ρ nn .Hence, LMPC guarantees that for any initial state x 0 ∈ Ω ρ, the state x(t) of the closed-loop slow subsystem described by Equation (61a) is maintained bounded within the region Ω ρ at all times and fulfills the bound of Equation ( 57) that was established in Proposition 3. Additionally, by introducing a fast time scale τ = t/ , a new coordinate z = z − f2 (x), and considering = 0, the closed-loop fast subsystem is given by: According to Assumption 4, global asymptotic stability is achieved for the origin of the closed-loop fast subsystem in Equation ( 64), satisfying Equation ( 12) for any z(0) ∈ R p .As a result, the closed-loop system in Equation ( 61) meets all the assumptions required for Theorem 1 in [19] to hold true.Hence, there exist class KL functions β x and β z , a pair of positive real numbers (δ, d), and * > 0, such that if max{|x(0), |z(0)|} ≤ δ and ∈ (0, * ], the closed-loop states of the slow and fast systems are bounded by Equations ( 59) and (60).

Remark 3.
It is possible to define the general class of two-time-scale systems with an additional input associated with the fast subsystem as follows: where u 2 ∈ R q 2 is the bounded manipulated input vector associated with the fast subsystem.The input vector u 2 is constrained by u 2 ∈ U 2 := {|u i | ≤ u 2 max,i , i = 1, ..., q 2 }.However, if there exists a stabilizing control law u 2 = h stable (x, z) that stabilizes the dynamics of the fast subsystem, then the general class of two-time-scale systems defined by Equation (65) can be reduced to the class of systems presented in Equation (1), which is the primary focus of this work.Furthermore, there are relevant works that have addressed the stability analysis of classes of systems defined in Equation (65), which share similar concepts but incorporate certain modifications.Interested readers may refer to [3] to further explore this subject.

Example of Application to a Chemical Process
In this section, we apply the proposed reduced-order machine learning strategy to a chemical process example.Specifically, we build two ML models-a reduced-order ML model to approximate the slow subsystem, and another ML model to capture the dynamics of the full two-time-scale system.Subsequently, each ML model is incorporated into an LMPC scheme, and the performance of the controllers is compared in terms of the closed-loop properties and computational efficiencies.
We consider a perfectly mixed, non-isothermal CSTR, in which an irreversible and endothermic reaction that transforms chemical A to product B (A → B) occurs.Figure 3 depicts the CSTR, where C A represents the concentration of chemical A, and T r represents the temperature of the reactor's contents.C A0 denotes the inlet molar concentration of chemical A, which is fed into the reactor at flow rate F and temperature T A0 .Assuming that the vessel maintains a constant holdup, V r denotes the volume of liquid present in the reactor.Due to the occurrence of an endothermic reaction within the reactor, a heating jacket with volume V j is used to provide the energy as required.T j0 is the inlet temperature of the heat-transfer fluid provided to the jacket with a flow rate F j .The reactor's contents and the heat-transfer fluid maintain constant densities, denoted as ρ m and ρ j , respectively.In addition, they both have constant heat capacities, denoted as c p,m and c p,j , respectively.The enthalpy of the reaction is ∆H r , the heat-transfer coefficient is U, and the heat-transfer contact area between the reactor and the jacket is A r .Given that k 0 is the pre-exponential constant, R is the ideal gas constant, and E is the activation energy of the reaction, the timevarying rate constant of the reaction is denoted by k and is given by the following equation: The first-principles equations of the CSTR are described by the following dynamic material and energy balance equations: where the values of the process parameters are listed in Table 1.Additionally, we consider the constant = V j V r to be the singular perturbation parameter.Hence, the system of Equation (67) can be written in the standard singularly perturbed form of Equation ( 1), which implies that the concentration of chemical A (C A ) and the temperature of the reactor (T r ) are the slow states, whereas the temperature of the heating jacket (T j ) can be considered as the fast state.The input of the system in Equation ( 67) is the feed concentration of chemical A (C A0 ).The control objective is to drive the system to its stable steady state (C As , T rs , T js ) = (2.54kmol m −3 , 274.4 K, 303.3 K), corresponding to the steady-state input value of C A0s = 3.75 kmol m −3 , while the input is permitted to vary within the range ∆C A0 = [−3.5,3.5] kmol m −3 .To shift the steady state of the system in Equation (67) to the origin, we express the states and the input in terms of deviation variables and rewrite the system in Equation (67).Specifically, the deviation variables are defined as follows: Therefore, to write the CSTR system of Equation (67) in the standard form of Equation (1), we define x T = [∆C A ∆T r ], z = ∆T j , and the input u = ∆C A0 .The explicit Euler method is utilized to simulate the CSTR system, i.e., Equation ( 67) is numerically integrated with a sufficiently small time step h c = 10 −4 h.The open-source software package IPOPT [20] is employed to solve the ML-based LMPC optimization problem in Equation (58), with a sampling time of ∆ = 0.03 h.
Table 1.Notations and parameter values of the CSTR.

Data Generation and Development of RNN Models
The first step in building the RNN models was to generate a comprehensive data set that captured the dynamics of the system in Equation (67).To implement the explicit Euler algorithm, we first needed to initialize the process model using a set of initial conditions (C A (0), T r (0), T j (0)), where C A (0) ∈ [0, 9] mol/m 3 , T r (0) ∈ [280, 370] K and T j (0) ∈ [300, 390] K.We conducted open-loop simulations by integrating the CSTR system defined by Equation (67) using a sufficiently small time step h c for 10 6 random initial conditions within the specified ranges.We applied random input signals C A0 ∈ [0.5, 7.5] mol/m 3 over a period of one sampling period ∆, which yielded a data set of 10 6 trajectories, each with a length equal to ∆.The data set was then divided into three sets-training, validation, and testing.The training of the models was performed using the open-source library Keras [21].The first RNN model was designed to predict the dynamics of the full two-time-scale system in Equation (67) (i.e, it predicts the dynamics of the slow and fast states).In other words, it utilizes the current state measurements x(t k ) and z(t k ), and the manipulated input for the subsequent sampling period u(t) ∈ [t k , t k+1 ) to predict the slow and fast state measurements of x(t) and z(t) ∀t ∈ [t k , t k+1 ].For simplicity, we denote the model that predicts the dynamics of the full two-time-scale system in Equation (67) as RNN F , where the subscript "F" denotes "full" and not "fast".This model was designed using a single layer of 20 long short-term memory recurrent neural network (LSTM-RNN) units, and the validation and testing errors achieved were 1.434 × 10 −6 and 1.432 × 10 −6 , respectively.In addition, the total number of learning parameters for training this model was 2063.We note that the network was designed in a step-wise manner.Initially, we started with a simple structure, consisting of a single-layer network with one LSTM-RNN unit but the validation error was high.Gradually, we increased the network's complexity by adding more LSTM-RNN units.This progression was continued until we achieved a satisfactory level of validation loss.In this particular case, the optimal outcome was achieved using a single-layer network with 20 LSTM-RNN units.Further increasing the number of LSTM-RNN units beyond 20 resulted in an increase in the validation loss.Therefore, RNN F was constructed using 20 LSTM-RNN units.On the other hand, the second RNN, denoted as RNN S , was designed to predict the dynamics of only the slow states in Equation (67).Specifically, this RNN utilizes the current slow state measurement x(t k ) and the manipulated input for the subsequent sampling period u(t) ∈ [t k , t k+1 ) to predict the slow state measurements of x(t) ∀t ∈ [t k , t k+1 ].This model was designed using a single layer of five LSTM-RNN units corresponding to a total of 192 learning parameters following the same step-wise manner strategy used to design RNN F , with a validation error of 2.67 × 10 −4 and a testing error of 2.68 × 10 −4 .The results of both models are summarized in Table 2.The low testing errors for both models indicate that the modeling error, as defined in Proposition 1, is sufficiently small over a wide range of open-loop trajectories.Due to having fewer learning parameters, the RNN S model is significantly less complex compared to the RNN F model, possibly leading to a lower computational cost and making it a more practical choice for control applications.Therefore, our aim is to investigate whether the LMPC scheme utilizing RNN S can sufficiently meet the desired control objective more efficiently compared to the LMPC scheme using RNN F , which is demonstrated via closed-loop simulations of the CSTR in Equation (67) under both LMPC designs.However, before proceeding, it is essential to define the Lyapunov and objective functions used in both LMPC schemes.For the RNN F -based LMPC, since the controller design was based on the dynamics of the full two-time-scale system in Equation (67), the Lyapunov function is defined as: where the matrix P F is given by , where Q 1F = 20 0 0 0.05 , Q 2F = 6, and Q 3F = 0.1.On the other hand, the RNN Sbased LMPC was designed based on the dynamics of the slow subsystem of the CSTR, specifically Equations (67a) and (67b) expressed in the standard form.Therefore, the Lyapunov function of the slow subsystem used in the RNN S -based LMPC is given by: where the matrix P is defined as The objective function is given by L , where Q 1 = 20 0 0 0.05 and Remark 4. In this particular application, we used long short-term memory recurrent neural network (LSTM-RNN) units to construct the models.LSTM-RNNs are a special type of RNN known for their ability to overcome the common problem of vanishing/exploding gradients in RNNs, resulting in generally better performance.For more information about LSTM-RNNs, interested readers may refer to [22].However, we can always utilize classical RNNs that are well-suited for this particular application by choosing appropriate structures and carefully tuning the hyperparameters to obtain the desired objective, as well as an acceptable level of performance.

Simulation Results
In this subsection, we conduct closed-loop simulations of the CSTR system described by Equation (67) under the two LMPC schemes, one with each RNN model.We first assess the ability of both LMPCs to achieve the control objective adequately compared to LMPCs that utilize the respective first-principles systems as their process models.Subsequently, we compare the RNN-based LMPCs to each other in terms of closed-loop performance and the computational time required for the simulations.
To establish the satisfactory performance of both RNN-based LMPCs, we compared their closed-loop performance to first-principles-based LMPCs, with the latter serving as the baseline for the best possible performance achievable under LMPC.Specifically, the LMPC in Equation (58) was considered in two scenarios, with the process model of Equation (58e) being different between each scenario: • Scenario 1: The LMPC utilizing RNN F as the process model was compared to an LMPC employing the first-principles model in Equation (67), denoted as FP F , as its process model.• Scenario 2: The LMPC utilizing RNN S as the process model was compared to an LMPC employing the first-principles slow subsystem in Equations ( 67a) and (67b), denoted as FP S , as its process model.In this case, for the first-principles-based LMPC, we note that the full CSTR system in Equation (67) was integrated, but only the slow states C A and T r from Equations (67a) and (67b) were used in calculating the LMPC cost function of Equation (58a) and the Lyapunov function V.
For both scenarios, we started the simulations from the initial condition IC main = (∆C A (0), ∆T r (0), ∆T j (0)) = (1,30,40).The LMPC prediction horizon was set to N = 3, and the remaining controller parameters were described in Section 5.1.The state and input trajectories under the LMPCs of scenarios 1 and 2 are shown in Figures 4 and 5, respectively.To quantitatively compare the performance of the different LMPCs, the time integral of the cost function of the LMPC, t f t=0 L(x(τ), u(τ)) dτ, was calculated over the entire simulation duration, t f = 3 h.For scenario 1, the cost function values were 3458 and 3485 for the FP F -based LMPC and the RNN F -based LMPC, respectively, whereas for scenario 2, the values of the cost function were 176 and 179 for the FP S -based LMPC and the RNN S -based LMPC, respectively.For both scenarios, we noticed that the value of the cost function when utilizing the corresponding RNN model closely aligned with that achieved when employing the respective first-principles model, demonstrating the reliable predictive performance of the designed RNN models and their ability to drive the system to the steady state when incorporated into an LMPC scheme.
Next, we demonstrated the computational efficiency of the RNN S -based LMPC due to its lower complexity while still achieving the desired controller performance.For IC main , the computational time for the RNN F -based LMPC shown in Figure 4 was 5578 s , whereas, for the RNN S -based LMPC shown in Figure 5, it was significantly lower at 2170 s, which was 61% lower.This substantial difference in computational time can make RNN S a more practical choice for real-time application of MPC, where computations must be completed within a sampling period to be sent to the actuator.Hence, to rigorously investigate the impact of integrating RNN F and RNN S into an LMPC framework in terms of stability and computational demand, we conducted closed-loop simulations from several different initial conditions in addition to IC main .
The initial conditions chosen for further investigation are outlined in Table 3.Ten different initial conditions (IC i where i = 1, ..., 10) within the operating region were selected, along with IC main .For each initial condition, we simulated the CSTR system described in Equation (67) under the RNN F -based LMPC, as well as the RNN S -based LMPC.The state and input profiles under each LMPC for four representative initial conditions are depicted in Figures 6-9, which indicate that both LMPC schemes successfully drove the process in Equation (67) to its steady state.The closed-loop behavior was similar for the remaining six initial conditions in terms of convergence to the steady state.Additionally, Table 3 provides the computational times required in terms of CPU time for the entire simulation duration of t f = 3 h when applying either LMPC scheme for the 11 different initial conditions.For all the tested initial conditions, the computational time required to execute the RNN S -based LMPC was less than that required for the RNN F -based LMPC.Among all the initial conditions in Table 3, the largest relative difference in computational time was observed for IC 3 , with a percentage difference of 92% between the two LMPC schemes.Hence, the incorporation of RNN S in the LMPC framework has been demonstrated to be more practical and computationally efficient compared to employing RNN F , without compromising stability and closed-loop performance.
The computational time of an MPC optimization problem can be influenced by several factors.As mentioned in [23], MPC computational times are generally affected by the horizon length N, the number of constraints in the MPC optimization problem, and the number of states and control actions of the system.The computational time is also directly related to the complexity of the process model.Hence, given the lower computational times across all 11 initial conditions tested, incorporating RNN S into an LMPC framework is a more practical approach compared to the use of RNN F .In the study of [24], it was observed that the CPU time of the IPOPT optimization problem increased as the complexity of the problem increased.Additionally, several works have discussed computational complexity evaluations of neural networks.For instance, the authors of [25] offered a comprehensive and systematic analysis for quantifying and comparing the computational complexity of neural network layers.The authors proposed and computed three computational metrics per layer for different types of neural networks and presented the exact mathematical expressions and derivations, providing a deep understanding of various networks' complexities.Upon analyzing the results derived in [25], it is evident that a neural network's complexity is proportional to its hyperparameters.For example, factors like the number of neurons, the number of hidden units, the input time sequence size, the number of output neurons, and other parameters associated with the structure, size, and design of the network, all of which affect the complexity and performance of a network, as well as the time required to solve the ML-based MPC optimization problem.As RNN S consisted of only 5 LSTM-RNN units while RNN F consisted of 20 LSTM-RNN units, the complexity of RNN F was much higher than that of RNN S , leading to a correspondingly complex MPC optimization problem.Hence, over the entire simulation duration of t f = 3 h, the total CPU time required to solve all 100 MPC optimization problems over the 100 sampling periods was significantly less for the RNN S -based LMPC than the RNN F -based LMPC.Remark 5.Although the state and input trajectories in Figure 6 have not completely settled, the process is stabilized under both the RNN F -based LMPC and the RNN S -based LMPC.However, the process requires simulation for a longer duration, exceeding 3 h, for the states and input to ultimately converge to their steady-state values and remain close to the origin.The reason for limiting the simulation time is to ensure fair and clear comparisons between the computational times for both controllers from various initial conditions, as listed in Table 3. Remark 6.In this particular application, RNN S was designed to utilize the current slow state measurement and the manipulated input for the next sampling period.As a result, the output of RNN S will be the predicted slow states x for only one sampling period ahead (1∆).Consequently, in this application, since the prediction horizon was chosen to be an integer multiple of the sampling time ∆ (i.e., N = 3 implies that t k+N = t k + 3∆ in Equation (58a)), based on the design of RNN S , RNN S is required to be invoked at least three times to be able to make predictions for the entire prediction horizon.Additionally, an alternative method involves designing an RNN model, denoted as RNN new , with the same goal of predicting the slow states.However, RNN new utilizes not only the current slow state measurements and the manipulated input for the next sampling period but also the current fast state measurements to predict the slow states one sampling period ∆ ahead, i.e., x(t k+1 ).To obtain predictions for the entire prediction horizon (N = 3), an FNN takes as its input the output of RNN new and predicts the fast state z at the next sampling period, i.e., the FNN predicts z(t k+1 ) from x(t k+1 ), following the design described in Section 2.5.Subsequently, for the next sampling period, the outputs of both RNN new and the FNN are given as inputs to RNN new , enabling it to predict the slow states x(t k+2 ) for the next sampling period in the prediction horizon.

Conclusions
In this work, we introduced a general class of nonlinear two-time-scale systems, where the two distinct time scales can be decoupled into slow subsystem and fast subsystem dynamics using singular perturbation analysis.Machine learning was employed to approximate the dynamics of both subsystems.In particular, an RNN was used to predict the slow state vector or, in other words, to approximate the dynamics of the slow subsystem, whereas an FNN was used to approximate the dynamics of the fast subsystem.
To draw inferences about the performance of the neural network models on unseen data, the generalization error bounds for both neural networks when modeling two-time-scale systems were derived.Subsequently, the RNN modeling of the slow states (RNN S ) was incorporated into an LMPC framework and sufficient conditions to guarantee closed-loop stability were derived under the sample-and-hold implementation of the LMPC.Finally, a chemical process example was used to demonstrate the efficiency of the LMPC scheme using an RNN to predict the slow states compared to first-principles-based LMPC schemes and an LMPC scheme using an RNN to model the full two-time-scale system.Through closed-loop simulations, while both RNN-based LMPCs were able to achieve the desired control objective of driving the system to the steady state, due to the significantly lower complexity of the RNN approximating only the slow subsystem, the LMPC based on RNN S required significantly less computational time in all simulations compared to the LMPC using the RNN modeling the full singularly perturbed system, which supported the use of RNN S in real-time MPC applications.

Figure 3 .
Figure 3.The continuous-stirred tank reactor with jacket.

3 )Figure 4 .
Figure 4. States and input trajectories of the CSTR under the Lyapunov-based MPC scheme using the first-principles model of the full process (FP F -based LMPC, blue line) and RNN F (RNN F -based LMPC, red dashed line).

3 )Figure 5 .
Figure 5. States and input trajectories of the CSTR under the Lyapunov-based MPC scheme using the first-principles model of the slow-subsystem (FP S -based LMPC, blue line) and RNN S (RNN S -based LMPC, red dashed line).

3 )Figure 6 .
Figure 6.Considering the initial condition IC 3 = (−3, 30, 5), (a) illustrates the time-varying profiles of the states and the input under the RNN F -based LMPC (solid line), whereas (b) shows the timevarying profiles of the states and the input under the RNN S -based LMPC (dashed line).

3 )Figure 7 .
Figure 7. Considering the initial condition IC 4 = (−2, −10, 100), (a) illustrates the time-varying profiles of the states and the input under the RNN F -based LMPC (solid line), whereas (b) shows the time-varying profiles of the states and the input under the RNN S -based LMPC (dashed line).

3 )Figure 8 .
Figure 8. Considering the initial condition IC 5 = (1, 90, 10), (a) illustrates the time-varying profiles of the states and the input under the RNN F -based LMPC (solid line), whereas (b) shows the timevarying profiles of the states and the input under the RNN S -based LMPC (dashed line).

3 )Figure 9 .
Figure 9. Considering the initial condition IC 10 = (−1, 10, 80), (a) illustrates the time-varying profiles of the states and the input under the RNN F -based LMPC (solid line), whereas (b) shows the time-varying profiles of the states and the input under the RNN S -based LMPC (dashed line).
49)Therefore, if Equation (44a) is satisfied, the following inequality holds for all x(t k ) ∈ Ω ρ\Ω ρ s and t ∈ [t k , t k+1 ):Upon integrating the aforementioned differential equation over the time interval t ∈ [t k , t k+1 ), it is derived that V( x(t k+1 )) ≤ V( x(t k )) − s ∆.Therefore, when Equation (44a) is satisfied, this leads to the fact that ˙V(x(t)) is negative for any x(t k ) ∈ Ω ρ\Ω ρ s .As a result, utilizing the control law u = Φ nn ( x) in a sample-and-hold fashion ensures that the closed-loop state of the RNN model in Equation (33) is bounded within the region Ω ρ and moves toward the origin.Nevertheless, it should be observed that Equation (50) may not hold true in cases where x(t k ) = x(t k ) ∈ Ω ρ s , which indicates that the state x(t k ) may leave the region Ω ρ s within one sampling period.Hence, we introduce the region Ω ρ nn in Equation (45a) to guarantee that the closed-loop state x(t k ) of the RNN model will be bounded in the region Ω ρ nn within one sampling period for all t ∈ [t k , t k+1 ), u ∈ U and x(t k ) ∈ Ω ρ s .Consider the case where x(t k+1 ) leaves the region Ω ρ s .Then, the controller u = Φ nn (x(t k+1 )) reactivates to derive the states into the region Ω ρ s , and Equation (50) will be satisfied again at t = t k+1 .Up to this point, it can be concluded that the state of the RNN system in Equation (33) is ultimately bounded in Ω ρ nn for all x 0 ∈ Ω ρ.
The objective function for the RNN F -based LMPC is given by L

Table 2 .
Specifications of the constructed recurrent neural network models.

Table 3 .
Computational times for the RNN F -based LMPC and the RNN S -based LMPC over the simulation duration of t f = 3 h starting from various initial conditions within the operating region.