Economic Model Predictive Control of Nonlinear Systems Using Online Learning of Neural Networks

Cheng Hu; Scarlett Chen; Zhe Wu

doi:10.3390/pr11020342

,

and

¹

Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore 117585, Singapore

²

Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, CA 90095, USA

^*

Author to whom correspondence should be addressed.

Processes2023, 11(2), 342;https://doi.org/10.3390/pr11020342

This article belongs to the Special Issue Machine Learning in Model Predictive Control and Optimal Control

Version Notes

Order Reprints

Abstract

This work focuses on the development of a Lyapunov-based economic model predictive control (LEMPC) scheme that utilizes recurrent neural networks (RNNs) with an online update to optimize the economic benefits of switched non-linear systems subject to a prescribed switching schedule. We first develop an initial offline-learning RNN using historical operational data, and then update RNNs with real-time data to improve model prediction accuracy. The generalized error bounds for RNNs updated online with independent and identically distributed (i.i.d.) and non-i.i.d. data samples are derived, respectively. Subsequently, by incorporating online updating RNNs within LEMPC, probabilistic closed-loop stability, and economic optimality are achieved simultaneously for switched non-linear systems accounting for the RNN generalized error bound. A chemical process example with scheduled mode transitions is used to demonstrate that the closed-loop economic performance under LEMPC can be improved using an online update of RNNs.

Keywords:

economic model predictive control; recurrent neural networks; online machine learning; generalized error; switched non-linear systems

1. Introduction

An economic model predictive control (EMPC) that addresses economic considerations within process control has attracted considerable attention in the control community over recent decades. Model predictive control (MPC) is applied in a wide variety of applications due to its ability to handle hard constraints on system states and manipulated inputs. The key idea of MPC is to compute an optimal input sequence using state feedback at the current sampling instant, and only the first input is fed to the system. Typically, a quadratic cost function is used in tracking MPC schemes to penalize the deviation of predicted system states and manipulated inputs from their steady-state values over a finite prediction horizon, such that the system is driven to its desired steady-state by minimizing the quadratic cost function. Unlike the steady-state operation of conventional tracking MPC schemes, EMPC generally uses a non-quadratic objective function to operate in a dynamic fashion (off steady-state) by optimizing process economics. Many research works have been developed to address closed-loop stability, economic optimization considerations, and model uncertainty for non-linear systems under EMPC (e.g., [1,2,3,4,5,6]).

Since, in real life, dynamical processes often involve mode transitions that may arise due to various reasons (e.g., actuator/sensor faults, feedback changes, and changes in environmental factors), it gives rise to an important research subject of switched systems. A class of systems with multiple switching modes is termed switched systems, whose active mode is determined by a switching signal. Switch systems have wide applications in engineering practice (e.g., mobile robots [7], electrical circuits [8], and flight control [9]). Control and optimization of switched systems have been extensively explored using methods in terms of Lyapunov stability theory [10,11], dwell-time [12,13], and linear matrix inequality [14]. In [15], a Lyapunov-based MPC framework was presented to stabilize switched non-linear systems that execute mode transitions at the prescribed switching times. Following this direction, in [16], a Lyapunov-based EMPC method was proposed to address the stabilization and economic optimality of switched non-linear systems subject to a prescribed switching schedule.

An accurate process model is a key requirement for achieving the desired control performance under EMPC. To this end, the above EMPC schemes often assume that the process model with the desired prediction accuracy can be obtained using first-principles modeling approaches. However, capturing the non-linear dynamics of complex and large-scale systems based on first-principles modeling approaches can be cumbersome and inaccurate when the physio-chemical phenomena of the system are not well-understood. Machine learning (ML) algorithms have shown great success in a variety of application domains in recent years, e.g., the resolving-power domination number of probabilistic neural networks was investigated in [17], and Gaussian process models were used to capture the dynamics of non-linear processes with unknown dynamics in [18] and with time-varying dynamics in [19]. As a powerful black box modeling tool among various ML algorithms, RNNs have achieved astonishing results in ML-based control for non-linear systems since they can approximate non-linear dynamics based on time-series data [20,21,22]. In [23], an RNN model was constructed offline to predict future states for EMPC that optimize economic benefits for non-linear systems while maintaining closed-loop stability. However, since ML models are generally trained offline to model non-linear systems under normal operation (i.e., without model uncertainty) using historical operational data, the resulting offline-trained ML models may not well approximate real-time non-linear dynamics subject to model uncertainty. Therefore, the presence of model uncertainty could result in degradation of the control performance of real-world non-linear processes under ML-based EMPC with offline-trained ML models. In [24], online learning with event-triggered and error-triggered mechanisms was applied to update ML models based on real-time data to learn model uncertainty, thus improving the control performance of non-linear systems subject to model uncertainty under ML-based EMPC.

Many works have been developed to integrate online learning models with a control design for non-linear processes (e.g., [25,26,27,28]). Although online learning models have shown their effectiveness in improving the prediction accuracy and control performance of non-linear processes, characterizing their generalization performance on the unseen testing set remains a critical challenge for real-time implementation of online ML-based controllers in practice. The generalized error bound is widely used to evaluate how an ML model developed using the training set can generalize well to the unseen testing set. The generalized error bound for online ML models has been developed in [29,30,31] by assuming that the online learner receives a data sequence generated in an i.i.d. manner. Certain efforts have been made in [32,33] to remove the i.i.d. assumption on training data points, for which the generalized error bound for online ML models updated with a set of non-i.i.d. data points has been derived. In our previous work, the generalized error bounds for RNNs updated online using i.i.d. and non-i.i.d. data points were established in [34,35], respectively, and the error bounds were utilized to derive closed-loop stability properties for switched non-linear systems without and with process disturbances under online updating RNN-based MPC. However, at this stage, it remains unclear how online learning RNNs can be integrated with EMPC to optimize economic benefits for switched non-linear systems while maintaining closed-loop stability.

To fill this research gap, this work aims to incorporate online learning RNNs into LEMPC to address closed-loop stability and economic optimality for switched non-linear systems operating under scheduled mode transitions. Specifically, the notation, class of switched non-linear systems, and the developments of RNNs are presented in Section 2. The generalized error bounds for RNNs updated online using i.i.d. and non-i.i.d. training data points are derived in Section 3. In Section 4, an LEMPC scheme that integrates online updating RNNs is proposed for switched non-linear systems involving process disturbances, under which probabilistic closed-loop stability is proved based on the RNN generalized error bound. In Section 5, a non-linear chemical process example with scheduled mode transitions is presented to demonstrate the efficacy of the proposed LEMPC scheme.

2. Preliminaries

2.1. Notation

The operators

{∥\cdot∥}_{F}

and

| \cdot |

are used to represent the Frobenius norm of a matrix and the Euclidean norm of a vector, respectively. The function

f (\cdot)

belongs to class

C^{1}

if

f (\cdot)

is continuously differentiable. We use the operator

“ \ ”

to represent set subtraction, i.e.,

M \ V : = {x \in R^{n} | x \in M, x \notin V}

. The continuous function

f : [0, a) \to [0, \infty)

belongs to class

K

if it satisfies

f (0) = 0

and increases strictly in its domain.

E [X]

and

P (A)

and are used to denote the expected value of a random variable X and the probability of a event A occurring, respectively. Let

a_{m}

and

b_{m}

be two sequences, we will write

a_{m} = O (b_{m})

provided that

lim {sup}_{m \to \infty} |a_{m} / b_{m}| < \infty

.

2.2. Class of Switched Non-Linear Systems

In this work, a class of switched non-linear systems described by the following first-order ordinary differential equations (ODEs) is considered.

\begin{matrix} \dot{x} & = F_{σ (t)} (x, u_{σ (t)}, w_{σ (t)}) \\ : = f_{σ (t)} (x) + g_{σ (t)} (x) u_{σ (t)} + h_{σ (t)} (x) w_{σ (t)} \end{matrix}

(1)

where

x \in R^{n}

,

u_{σ (t)} \in R^{n_{u}}

, and

w_{σ (t)} \in R^{n_{w}}

denote the vectors of system states, control inputs, and disturbances, respectively. The control input constraint is given by

u_{σ (t)} \in U_{σ (t)}

, where the set

U_{σ (t)}

defines the vectors of the minimum value

u_{σ (t)}^{m i n}

and the maximum value

u_{σ (t)}^{m a x}

for the input constraint (i.e.,

U_{σ (t)} : = \{u_{σ (t)}^{m i n} \leq u_{σ (t)} \leq u_{σ (t)}^{m a x}\}

). The disturbance vector is subject to the constraint

w_{σ (t)} \in W_{σ (t)} : = \{| w_{σ (t)} | \leq w_{m_{σ (t)}}, w_{m_{σ (t)}} \geq 0\}

. The switching function

σ (t) : [0, \infty) \to ψ

takes a value in

ψ : = {1, \dots, p}

. The number of switching modes is denoted by p. Throughout this manuscript, the notations

t_{k}^{o u t}

and

t_{k}^{i n}

are used to represent the time at which the k-th mode (i.e.,

k \in ψ

) of Equation (1) is switched out and in, respectively. Therefore, the state-space model of Equation (1) is denoted by

\dot{x} = F_{k} (x, u_{k}, w_{k})

with

σ (t) = k

when the system operates under mode k for

t \in [t_{k}^{i n}, t_{k}^{o u t})

. For all

k \in ψ

,

f_{k} (\cdot)

,

g_{k} (\cdot)

, and

h_{k} (\cdot)

are assumed to be sufficiently smooth functions of dimensions

n \times 1

,

n \times n_{u}

, and

n \times n_{w}

, respectively. Additionally, for all

k \in ψ

, we assume that

f_{k} (0) = 0

and the initial time

t_{0}

is zero, indicating that the origin is a steady-state of Equation (1) without disturbances (i.e., the nominal system). All states are assumed to be measurable at each sampling instant

t_{q} = t_{k}^{i n} + q Δ

, where

Δ

is the sampling period,

q = 0, 1, \dots, N_{k}

, and

N_{k}

is assumed to be a positive integer denoting the total number of sampling periods within

t \in [t_{k}^{i n}

,

t_{k}^{o u t}

).

For each mode

k \in ψ

, a stabilizing controller

u_{k} = Φ_{k} (x) \in U_{k}

(e.g., the universal Sontag control law [36]) is assumed to exist in the sense that the origin of Equation (1) without disturbances is rendered exponentially stable. Following the construction method in [37], a level set of

V_{k} (x)

(denoted by

Ω_{ρ_{k}} : = \{x \in R^{n} ∣ V_{k} (x) \leq ρ_{k}\}

, where

0 < ρ_{k}

for

k \in ψ

) is used to represent the stability region of Equation (1) operating under mode k. Additionally, taking into account by the boundedness of

x, u_{k}, w_{k}

, the smoothness assumed for

f_{k} (\cdot)

,

g_{k} (\cdot)

, and

h_{k} (\cdot)

, and the continuous differentiable property of

V_{k} (x)

, positive constants

L_{w_{k}}^{'}, L_{w_{k}}, L_{x_{k}}^{'}, L_{x_{k}}, M_{k}

are assumed to exist, such that the following inequalities hold for all

x^{'}, x \in Ω_{ρ_{k}}, w_{k} \in W_{k}

,

u_{k} \in U_{k}

,

k \in ψ

:

\begin{matrix} |F_{k} (x^{'}, u_{k}, 0) - F_{k} (x, u_{k}, w_{k})| \leq L_{w_{k}} | w_{k} | + L_{x_{k}} |x^{'} - x| \end{matrix}

(2a)

\begin{matrix} |\frac{\partial V_{k} (x^{'})}{\partial x} F_{k} (x^{'}, u_{k}, 0) - \frac{\partial V_{k} (x)}{\partial x} F_{k} (x, u_{k}, w_{k})| \leq L_{w_{k}}^{'} | w_{k} | + L_{x_{k}}^{'} |x^{'} - x| \end{matrix}

(2b)

\begin{matrix} | F_{k} (x, u_{k}, w_{k}) | \leq M_{k} \end{matrix}

(2c)

2.3. Recurrent Neural Networks (RNN)

As opposed to the architecture of a traditional feedforward neural network in which signals are transmitted in only one direction, information in an RNN travels in both directions (i.e., forward and backward) due to the inclusion of recurrent loops as shown in Figure 1. This enables the feedback of signals associated with previous inputs back into the network, and fosters a temporal dynamic behavior that corresponds to the numerical techniques (e.g., the explicit Euler method) for solving an ODE. Therefore, the architecture of the RNN is especially suitable for modeling non-linear dynamic systems governed by ODEs.

Figure 1. A schematic of a recurrent neutral network and its unfolded structure.

In this work, we use a one-hidden-layer RNN described by the following form as a surrogate model for Equation (1):

\begin{matrix} h_{t, ℓ} & = σ_{h} (Q h_{t, ℓ - 1} + W x_{t, ℓ}) \\ y_{t, ℓ} & = σ_{y} (V h_{t, ℓ}) \end{matrix}

(3)

where

y_{t, ℓ} \in R^{d_{y}}

,

x_{t, ℓ} \in R^{d_{x}}

, and

h_{t, ℓ} \in R^{d_{h}}

,

t = 1, \dots, T

(T is the data sequences) and

ℓ = 1, \dots, L_{n n}

(

L_{n n}

is the time length), are the RNN outputs, the RNN inputs, and the hidden states, respectively. The weight matrices

V \in R^{d_{y} \times d_{h}}

,

Q \in R^{d_{h} \times d_{h}}

, and

W \in R^{d_{h} \times d_{x}}

are associated with the output layer, the hidden layer, and the input layer, respectively. The non-linear activation functions

σ_{y}

and

σ_{h}

are associated with the output and the hidden layers, respectively. In this work, we follow the method in [37] to generate historical data for developing the initial offline-learning RNN. Specifically, numerous open-loop simulations are carried out for Equation (1) without disturbances operating under mode k with various conditions

x \in Ω_{ρ_{k}}

and

u_{k} \in U_{k}

, where

u_{k}

is applied to the system of Equation (1) in a sample-and-hold fashion at each sampling step (i.e.,

u_{k} (t) = u_{k} (t_{q})

holds for

t \in

[t_{q}, t_{q + 1})

,

t_{q + 1} : = t_{q} + Δ

, where

Δ

is the sampling period). Subsequently, the initial RNN model is developed with open-loop simulation data to predict one sampling period forward using

x = [x (t_{q}) u_{k} (t_{q})]

as RNN inputs. The RNN outputs

y

are the predicted states within one sampling period (i.e.,

t \in [t_{q}, t_{q + 1})

) that contain all internal time steps of

L_{n n} = Δ / {\bar{h}}_{c}

, where

{\bar{h}}_{c}

denotes the integration time step with a sufficiently small value used for solving the system of Equation (1) with numerical methods (e.g., the explicit Euler method). Without loss of generality, RNNs are developed under the following assumptions [38]:

Assumption 1.

An upper bound exists for the RNN inputs in the sense that for all

t = 1, \dots, T

,

|x_{t, ℓ}| \leq B_{X}

, where

ℓ = 1, \dots, L_{n n}

.

Assumption 2.

There are upper bounds for the weight matrices in the sense that

{∥ V ∥}_{F} \leq B_{V, F}

,

{∥ Q ∥}_{F} \leq B_{Q, F}

, and

{∥ W ∥}_{F} \leq B_{W, F}

.

Assumption 3.

σ_{y}

is a positive-homogeneous and 1-Lipschitz continuous activation function in the sense that for all

β \geq 0

,

σ_{y} (β z) = β σ_{y} (z)

, where

z \in R

.

Let

h (\cdot) \in H

be the RNN model mapping the RNN input to the RNN output (

x \to y

), where

H

denotes the hypothesis class. In this work, we use the mean squared error (MSE) as the loss function

L (h (x), \bar{y})

for the development of RNNs, where

\bar{y} \in R^{d_{y}}

represents the true or labeled output vector. Since the training dataset for RNNs is generated for bounded states

x \in Ω_{ρ_{k}}

and inputs

u_{k} \in U_{k}

, there is an upper bound (denoted by

r_{ℓ}

) for the RNN output

y_{ℓ}

and the true output

{\bar{y}}_{ℓ}

, i.e.,

| {\bar{y}}_{ℓ} |, | y_{ℓ} | \leq r_{ℓ}

for

ℓ = 1, \dots, L_{n n}

, where

r_{ℓ} > 0

. Therefore, the MSE loss function meets the local Lipschitz property in the sense that for all

| y_{ℓ} |, | {\bar{y}}_{ℓ} | \leq r_{ℓ}

, the inequality

|L (y_{ℓ}^{'}, \bar{y_{ℓ}}) - L (y_{ℓ}, \bar{y_{ℓ}})| \leq L_{r} |y_{ℓ}^{'} - y_{ℓ}|

is satisfied, where

L_{r}

represents the local Lipschitz constant.

3. Online Learning of RNNs

Offline-trained RNNs are constructed based on historical data gathered from Equation (1) without disturbances, and may not be capable of capturing the dynamics of Equation (1) in real-time operation involving disturbances. Therefore, online learning is applied to update ML models to approximate the non-linear dynamics of Equation (1) with disturbances using real-time data. To integrate online ML models with EMPC for switched non-linear systems, ML models need to be developed with the desired predictive capacity on unseen testing data, which is commonly measured by generalized error bounds. In this section, the generalized error bounds for RNN models updated online using i.i.d. and non-i.i.d. training data points are developed, respectively.

3.1. Generalized Error of RNNs Updated Online with i.i.d. Training Data

We first consider a special case of switched non-linear systems, where the system dynamics of Equation (1) does not vary over time, and, thus, Equation (1) can be simplified to the following state-space model:

\dot{x} = F (x, u) : = f (x) + g (x) u

(4)

It is assumed that there exist multiple steady-states

x_{s_{k}}

for the non-linear system of Equation (4) under the stabilizing controller

u = u_{s}

, where

k \in ψ = {1, 2, \dots, p}

and

p > 1

is a positive integer that represents the number of steady-states. When Equation (4) operates within the stability region around the steady-state

x_{s_{k}}

,

k \in ψ

,

\forall t \in [t_{k}^{i n}, t_{k}^{o u t})

, we define the system operating in mode k. In this section, we assume that historical data are only available for a portion of the stability region around

x_{s_{k}}

. The initial RNN developed with the limited historical data around

x_{s_{k}}

may not be capable of approximating the system dynamics of Equation (4) when the system operates in the stability region around another steady-state (e.g.,

x_{s_{f}}

,

f \in ψ

). Therefore, it is essential to update the RNN models with improved prediction accuracy through online learning. In this case, online learning RNNs are developed using real-time process data that are drawn i.i.d. from the system of Equation (4).

We consider a sequence of data samples

(X_{1}, Y_{1})

, …,

(X_{T}, Y_{T})

drawn i.i.d. from the same distribution

D

(e.g., the system of Equation (1)), where the online update of ML models takes place sequentially by processing these i.i.d. samples. Specifically, given an initial hypothesis

h_{1} \in H

, the online learner

A

receives an instance

X_{t}

and makes a prediction

h_{t} (X_{t})

on the t-th round, where

t = 1, \dots, T

. Subsequently, the learner

A

receives the true output

Y_{t}

, incurs the loss

L (h_{t} (X_{t}), Y_{t})

(e.g., the MSE loss), and then updates the ML model from

h_{t}

to

h_{t + 1}

after processing

(X_{t}, Y_{t})

. Therefore, the learner

A

yields

h_{1}, h_{2} \dots, h_{T + 1}

(i.e., a sequence of hypotheses) after T rounds. To simplify the notation, the loss

L (h_{t} (X_{t}), Y_{t})

is denoted by

L (h_{t}, Z_{t})

for any data sample

Z_{t} = (X_{t}, Y_{t})

, and we use the shorthand

Z_{n}^{m}

to represent

Z_{n}, Z_{n + 1}, \dots, Z_{m}

(i.e., a sequence of data samples). In general, the goal of the online learner

A

is to minimize the regret after the end of T rounds, which is defined as follows [32]:

\begin{matrix} {Reg}_{A} (T) = \sum_{t = 1}^{T} L (h_{t}, Z_{t}) - \sum_{t = 1}^{T} L (h^{⋆}, Z_{t}) \end{matrix}

(5)

where the first term denotes the cumulative loss of hypotheses

h_{1}, \dots, h_{T}

, and the second term represents the minimum cumulative loss that is achieved by the best mode

h^{⋆}

in the hypothesis class

H

, where

h^{⋆}

is defined by:

h^{⋆} = \underset{h \in H}{arg min} \sum_{t = 1}^{T} L (h, Z_{t})

. Note that we can only obtain

h^{⋆}

in hindsight after the learner receives all the samples

Z_{1}^{T}

. Given a hypothesis h, its generalized error is defined as the expected loss at a new data point

(X, Y)

:

R (h) = E [L (h (X), Y)]

.

The following lemma gives the generalized error bound for an ensemble hypothesis of ML models updated online using i.i.d. training set.

Lemma 1

([34]). Given the training set

Z_{1}^{T}

drawn i.i.d. from the same distribution

D

, the learner

A

produces the hypotheses

h_{1}, \dots, h_{T}

by processing the samples

Z_{1}^{T}

sequentially with the loss function

L (\cdot, \cdot)

that is convex in its first argument and satisfies

0 \leq L (\cdot, \cdot) \leq M

. Let

Ω_{T} = : \{λ \in R^{T} ∣ λ_{t} \geq 0 \forall t = 1, \dots, T, and \sum_{t = 1}^{T} λ_{t} = 1\}

be a unit simplex,

λ = (λ_{1} \dots λ_{T}) \in Ω_{T}

be a weight vector, and

h = \sum_{t = 1}^{T} λ_{t} h_{t}

be the ensemble hypothesis. Then, with a probability no less than

1 - δ

, the following inequalities are satisfied for any

δ > 0

:

\begin{matrix} R (h) \leq M | λ | \sqrt{2 log \frac{1}{δ}} + \sum_{t = 1}^{T} λ_{t} L (h_{t}, Z_{t}) \end{matrix}

(6)

\begin{matrix} R (h) \leq M | λ | \sqrt{2 log \frac{1}{δ}} + \sum_{t = 1}^{T} M | λ_{t} - \frac{1}{T} | + \sum_{t = 1}^{T} λ_{t} L (h^{⋆}, Z_{t}) + \frac{{Reg}_{A} (T)}{T} \end{matrix}

(7)

The generalized error bound for

h = \sum_{t = 1}^{T} λ_{t} h_{t}

given by the right-hand side (RHS) of Equation (6) depends on the cumulative loss incurred by the online algorithm after T rounds (the second term) and an error function (the first term) with respect to some parameters

λ

,

δ

, and M that represent the weight vector, the confidence level, and the upper bound for

L (\cdot, \cdot)

, respectively. Additionally, Equation (7) is derived using online-to-batch conversion to establish an important connection between the generalized error in the batch setting and the regret of an online learning algorithm. In detail, the generalized error

R (h)

is bounded by two error functions based on T,

λ

, M, and

δ

, the cumulative loss of

h^{⋆}

, and the average regret in T rounds. The average regret converges to zero if the online algorithm

A

achieves a sub-linear regret bound (i.e.,

{Reg}_{A} (T) = O (\sqrt{T})

), and the two error functions are known once the parameters of

δ

and

λ

are chosen. Finally, it is noted that the weight vector

λ

is a dominant factor that affects the calculation of the generalized error bound in Lemma 1. Therefore, to achieve a low generalized error, the weight vector

λ

for the hypotheses

h_{1}, \dots, h_{T}

can be optimized by solving the optimization problem as follows [34]:

min_{λ \in Ω_{T}} \sum_{t = 1}^{T} λ_{t} L (h_{t}, Z_{t}) s . t . \sum_{t = 1}^{T} | λ_{t} - \frac{1}{T} | \leq α

(8)

where the objective function is the cumulative loss of hypotheses

h_{1}, \dots, h_{T}

,

\sum_{t = 1}^{T} | λ_{t} - \frac{1}{T} | \leq α

is an inequality constraint used for constraining the difference between the weight

λ_{t}

and

1 / T

, and

α \geq 0

denotes a hyperparameter predetermined by a validation procedure.

3.2. Generalized Error of RNNs Updated Online with Non-i.i.d. Training Set

We next consider the switched non-linear systems of Equation (1) subject to bounded disturbances and the system is switched between different modes with time-varying system dynamics. As a result, process data collected in real-time operation of the system of Equation (1) are non-i.i.d. samples. The notations and the training procedure of RNNs updated with non-i.i.d. training samples follow those in the i.i.d. case. The only difference is that

(X_{T + 1}, Y_{T + 1})

(i.e., the new data point) in the non-i.i.d. setting is conditioned on the past samples

Z_{1}^{T}

, and thus, the generalized error of the hypothesis

h \in H

is defined as follows [33]:

R_{T + 1} (h, Z_{1}^{T}) : = E [L (h (X_{T + 1}), Y_{T + 1}) ∣ Z_{1}^{T}]

(9)

The following lemma provides the generalized error bound for an ensemble hypothesis of ML models updated online using non-i.i.d. training set.

Lemma 2

([33]). Given the non-i.i.d. training set

Z_{1}^{T}

, the learner

A

yields hypotheses

h_{1}, \dots, h_{T + 1}

by processing samples

Z_{1}^{T}

sequentially. Let the weight vector λ and the loss function

L (\cdot, \cdot)

be defined in Lemma 1, and

h = \sum_{t = 1}^{T} λ_{t} h_{t + 1}

be the ensemble hypothesis. Then, with probability no less than

1 - δ

, the following inequalities are satisfied for any

δ > 0

:

R_{T + 1} (h, Z_{1}^{T}) \leq M | λ | \sqrt{2 log \frac{1}{δ}} + \sum_{t = 1}^{T} λ_{t} L (h_{t + 1}, Z_{t + 1}) + disc (λ)

(10)

R_{T + 1} (h, Z_{1}^{T}) \leq M | λ | \sqrt{2 log \frac{1}{δ}} + \sum_{t = 1}^{T} M | λ_{t} - \frac{1}{T} | + \sum_{t = 1}^{T} λ_{t} L (h^{⋆}, Z_{t + 1}) + \frac{{Reg}_{A} (T)}{T} + disc (λ)

(11)

In contrast to the generalized error bound for ML models updated online using i.i.d. training set in Lemma 1, Lemma 2 contains the term

disc (λ)

in the non-i.i.d. case, which is used to quantify the divergence of the sample and target distributions, and is given by [33]:

disc (λ) = sup_{h_{t} \in H} |\sum_{t = 1}^{T} λ_{t} (R_{t + 1} (h_{t + 1}, Z_{1}^{t}) - R_{T + 1} (h_{t + 1}, Z_{1}^{T}))|

(12)

Since the calculation of

disc (λ)

requires knowledge of the distribution of

Z_{T + 1}

and we do not have access to

Z_{T + 1}

at the end of the T-th round, the discrepancy

disc (λ)

needs to be estimated based on the given data samples. Based on the results of Theorem 2 in [39] and Lemma 7 in [33], showing that the discrepancy

disc (λ)

can be bounded using sequential Rademacher complexity, the following lemma presents the generalized error bound for ML models updated online using non-i.i.d. training set in terms of sequential Rademacher complexity.

Lemma 3

([35]). Given the non-i.i.d. training set

Z_{1}^{T}

, let

h = \sum_{t = 1}^{T} λ_{t} h_{t + 1}

be the ensemble hypothesis that is developed satisfying all the conditions in Lemma 2. Consider a family of loss functions

F

defined by

F : = \{(x, \bar{y}) \to L (h (x), \bar{y}), h \in H\}

. For any

δ > 0

, the following inequality is satisfied with probability no less than

1 - δ

:

\begin{matrix} R_{T + 1} (h, Z_{1}^{T}) \leq & 2 M | λ | \sqrt{2 log \frac{1}{δ}} + \sum_{t = 1}^{T} λ_{t} L (h_{t + 1}, Z_{t + 1}) + | λ | + Λ + {\hat{disc}}_{H} (λ) \\ + 6 \sqrt{log T π} M R_{T}^{s e q} (F) \end{matrix}

(13)

where

{\hat{disc}}_{H} (λ) : = {sup}_{\bar{h}, h_{t} \in H} |\sum_{t = 1}^{T} λ_{t} (L (h_{t + 1}, Z_{t + 1}) - L (h_{t + 1} (X_{T + 1}), \bar{h} (X_{T + 1})))|

denotes the empirical discrepancy,

Λ : = {inf}_{{\bar{h}}^{⋆} \in H} E [(L_{r} |{\bar{h}}^{⋆} (X_{T + 1}) - Z_{T + 1}|) ∣ Z_{1}^{T}]

, and

R_{T}^{s e q} (F)

denotes the sequential Rademacher complexity of the function class

F

.

It is noted that

Λ

and

{\hat{disc}}_{H} (λ)

can be computed and optimized based on the given data samples, and

| λ |

can be obtained once the weight vector

λ

is chosen. As a result, to calculate the generalized error bound of Equation (13), it remains to characterize the upper bound on

R_{T}^{s e q} (F)

. The definition of sequential Rademacher complexity is given below.

Definition 1.

(Sequential Rademacher complexity [33]). Let

z

be a

Z

-valued, T-depth tree and

G

be a class of functions mapping

Z \to R

. The sequential Rademacher complexity of a function class

G

on a

Z

-valued tree

z

is given by:

R_{T}^{s e q} (G) = sup_{z} E [sup_{g \in G} \sum_{t = 1}^{T} λ_{t} ϵ_{t} g (z_{t} (ϵ))]

(14)

where

ϵ = (ϵ_{1}, \dots, ϵ_{T - 1})

represents a set of Rademacher random variables drawn i.i.d. from

{- 1, 1}

and

z_{t} (ϵ)

denotes

z_{t} (ϵ_{1}, ϵ_{2}, \dots, ϵ_{t - 1})

.

Lemma 3 characterizes the generalized error bound for a broad class of online ML models. In the remainder of this section, we will develop the generalized error bound for the RNN model of Equation (3) updated online with non-i.i.d. training set drawn from the non-linear system of Equation (1). We consider the RNN hypothesis class

H_{ℓ}

that maps the first ℓ-time-step inputs to the ℓ-th output,

ℓ = 1, \dots, L_{n n}

, and a family of loss functions

F_{ℓ}

associated with

H_{ℓ}

is defined by

F_{ℓ} : = \{(x, \bar{y}) \to L (h (x), \bar{y}), h \in H_{ℓ}\} .

Note that

H_{ℓ}

is a family of vector-valued functions since the RNN model of Equation (3) is developed to approximate the dynamics of the multiple-input and multiple-output non-linear system (i.e., Equation (1)). Following the results of Lemma 4 in [32], we have the following upper bound for

R_{T}^{s e q} (F_{ℓ})

:

R_{T}^{s e q} (F_{ℓ}) \leq 8 L_{r} (4 \sqrt{2} {log}^{3 / 2} (e T^{2}) + 1) \sum_{j = 1}^{d_{y}} R_{T}^{s e q} (H_{j, ℓ})

(15)

where

H_{j, ℓ}

denotes a family of real-valued functions that corresponds to the j-th component of

H_{ℓ}

,

j = 1, \dots, d_{y}

, and

d_{y}

represents the RNN output dimension. Subsequently, we develop the upper bound on

R_{T}^{s e q} (H_{j, ℓ})

following the proof techniques in [38] that peel off the weight matrices (i.e., V, W, and Q) and the activation functions (i.e.,

σ_{y}

and

σ_{h}

) layer by layer.

Lemma 4

([35]). Consider a family of real-valued functions

H_{j, ℓ}

that corresponds to the j-th component of the RNN hypothesis class

H_{ℓ}

, with weight matrices and activation functions that satisfy Assumptions 1–3. The following inequality is satisfied for the RNNs developed with the non-i.i.d. training set

Z_{1}^{T}

:

R_{T}^{s e q} (H_{j, ℓ}) \leq (1 + \sqrt{2 (ℓ + 1) log (2)}) Γ B_{X} | λ |

(16)

where

Γ = \frac{{(B_{Q, F})}^{ℓ} - 1}{B_{Q, F} - 1} B_{W, F} B_{V, F}

.

Based on Equations (13), (15) and (16), the following theorem develops the generalized error bound for RNN models updated online using the non-i.i.d. training set.

Theorem 1

([35]). Let

H_{ℓ}

be the RNN hypothesis class that maps the first ℓ-time-step inputs to the ℓ-th output,

ℓ = 1, \dots, L_{n n}

,

h_{1}, \dots, h_{T + 1}

be the hypotheses from

H_{ℓ}

that are developed using the non-i.i.d. training set

Z_{1}^{T}

and meet all the conditions in Lemmas 2–4, and

h = \sum_{t = 1}^{T} λ_{t} h_{t + 1}

be the ensemble hypothesis. For any

δ > 0

, the following inequality holds with probability no less than

1 - δ

:

\begin{matrix} R_{T + 1} (h, Z_{1}^{T}) \leq & 2 M | λ | \sqrt{2 log \frac{1}{δ}} + \sum_{t = 1}^{T} λ_{t} L (h_{t + 1}, Z_{t + 1}) + | λ | + Λ + {\hat{disc}}_{H} (λ) \\ + M L_{r} C_{T} d_{y} (1 + \sqrt{2 (ℓ + 1) log (2)}) Γ B_{X} | λ | \end{matrix}

(17)

where

C_{T} = O (\sqrt{log T π} (4 \sqrt{2} {log}^{3 / 2} (e T^{2}) + 1))

.

The weight vector

λ

for the hypotheses

h_{1}, \dots, h_{T + 1}

developed with non-i.i.d. training set can be optimized as follows [35]:

\begin{matrix} min_{λ \in Ω_{T}} {\hat{disc}}_{H} (λ) + \sum_{t = 1}^{T} λ_{t} L (h_{t + 1}, Z_{t + 1}) s . t . λ_{T} = 0, \sum_{t = 1}^{T} | λ_{t} - \frac{1}{T} | \leq α \end{matrix}

(18)

Remark 1.

Compared to the optimization problem of Equation (8) for the i.i.d. case, the objective function of Equation (18) accounts for the empirical discrepancy term

{\hat{disc}}_{H} (λ)

for non-i.i.d training set. Additionally, since the cost function of Equation (18) is based on the sample

Z_{T + 1}

that is unavailable after the T-th round, Equation (18) includes an additional equality constraint that lets

λ_{T} = 0

, and, therefore,

h_{T + 1}

(i.e., the last hypothesis) is discarded. This is consistent with the i.i.d. case where the ensemble hypothesis is developed using the hypotheses

h_{1}, \dots, h_{T}

without

h_{T + 1}

. It should be noted that the initial hypothesis

h_{1}

is also discarded for the ensemble hypothesis in the non-i.i.d. case, since

h_{1}

is trained offline using historical data that cannot predict the system dynamics of Equation (1) well with disturbances. Therefore, the ensemble hypothesis h is derived using the hypotheses

h_{2}, \dots, h_{T}

, that is,

h = \sum_{t = 1}^{T - 1} λ_{t} h_{t + 1}

.

4. RNN-Based LEMPC of Switched Non-Linear Systems

In this section, we develop a framework that integrates online learning RNN models with Lyapunov-based EMPC (RNN-LEMPC) for switched non-linear systems. Specifically, for each switching mode

k \in ψ

, the closed-loop state of Equation (1) is maintained in the prescribed stability region while an economic cost function is maximized to obtain optimal economic performance for the system under RNN-LEMPC. Additionally, due to the switching behavior of Equation (1), an appropriate mode transition constraint is included in the RNN-LEMPC formulation to guarantee the success of scheduled mode transitions. Note that in this section, we will only discuss the case of RNNs updated online with non-i.i.d. training set for modeling Equation (1) involving process disturbances, since Equation (4) switched between different steady-states without disturbances (the i.i.d. case) is a special case of Equation (1) and the stability results derived in this section can be easily adapted to the i.i.d. case.

4.1. Lyapunov-Based Control Using RNN Models

To simplify the closed-loop stability analysis for the system of Equation (1) under RNN-LEMPC, we represent the RNN model of Equation (3) in the following continuous-time state-space form:

\dot{\hat{x}} = F_{{n n}_{k}} (\hat{x}, u_{k})

(19)

where

\hat{x} \in R^{n}

denotes the RNN state vector and

u_{k} \in R^{n_{u}}

represents the control input vector. For each mode

k \in ψ

, a stabilizing control law

u_{k} = Φ_{{n n}_{k}} (x) \in U_{k}

is assumed to exist in the sense that the origin of the RNN of Equation (19) is rendered exponentially stable. This stabilizability assumption indicates that there is a control Lyapunov function

{\hat{V}}_{k} (x)

belonging to class

C^{1}

such that the following inequalities are satisfied for all states x in

{\hat{D}}_{k}

:

\begin{matrix} {\hat{c}}_{1_{k}} {| x |}^{2} \leq {\hat{V}}_{k} (x) \leq {\hat{c}}_{2_{k}} {| x |}^{2}, \end{matrix}

(20a)

\begin{matrix} \frac{\partial {\hat{V}}_{k} (x)}{\partial x} F_{{n n}_{k}} (x, Φ_{{n n}_{k}} (x)) \leq - {\hat{c}}_{3_{k}} {| x |}^{2}, \end{matrix}

(20b)

\begin{matrix} |\frac{\partial {\hat{V}}_{k} (x)}{\partial x}| \leq {\hat{c}}_{4_{k}} | x |, \end{matrix}

(20c)

where

{\hat{D}}_{k}

denotes an open neighborhood around the origin,

{\hat{c}}_{i_{k}}, i = 1, 2, 3, 4

,

k \in ψ

, are positive constants. Similarly to the construction procedure of the stability region for Equation (1) without disturbances, the stability region for the RNN model of Equation (19) operating under mode k with

u_{k} = Φ_{{n n}_{k}} (x) \in U_{k}

is characterized as a level set of

{\hat{V}}_{k} (x)

as follows:

Ω_{{\hat{ρ}}_{k}} : = \{x \in R^{n} ∣ {\hat{V}}_{k} (x) \leq {\hat{ρ}}_{k}\}

, where

{\hat{ρ}}_{k} > 0

for

k \in ψ

. Historical data are assumed to be available for Equation (1) without disturbances operating under each mode

k \in ψ

, and, thus, the initial RNN can be constructed offline using the corresponding historical data to approximate the nominal system dynamics for each mode, respectively. Subsequently,

Φ_{{n n}_{k}} (x)

and

Ω_{{\hat{ρ}}_{k}}

for the initial RNN model can be characterized accordingly. Note that although the online update of RNNs is carried out using real-time data in this work,

Ω_{{\hat{ρ}}_{k}}

and

Φ_{{n n}_{k}} (x)

will not be updated accordingly due to the excessive computational burden of real-time characterization of

Ω_{{\hat{ρ}}_{k}}

and

Φ_{{n n}_{k}} (x)

for online learning RNNs. Therefore,

Φ_{{n n}_{k}} (x)

and

Ω_{{\hat{ρ}}_{k}}

designed using the initial RNN remain unchanged at all times, and we will demonstrate that closed-loop stability for Equation (1) in terms of the boundedness of the state within the stability region is achieved in probability under LEMPC using online learning RNNs.

4.2. Lyapunov-Based EMPC Using RNN Models

Before we proceed to the closed-loop stability analysis for the system of Equation (1) under RNN-LEMPC, we need the following propositions that guarantee closed-loop stability of the system of Equation (1) under the controller

u_{k} = Φ_{{n n}_{k}} (x) \in U_{k}

. Specifically, Proposition 1 derives an upper bound for the state error between the RNN predicted state

\hat{x} (t)

of Equation (19) and the actual state

x (t)

of Equation (1) taking into account bounded disturbances and model mismatch.

Proposition 1

([40]). Consider the RNN model of Equation (19) and the system of Equation (1) operating in mode k with the same initial condition

{\hat{x}}_{0} = x_{0} \in Ω_{{\hat{ρ}}_{k}}

,

t_{k}^{o u t} = \infty

, and

| w_{k} | \leq w_{m_{k}}

. There exist a function

f_{k} (\cdot)

belonging to class

K

and a positive constant κ such that for all

x, \hat{x} \in Ω_{{\hat{ρ}}_{k}}

, the following inequalities hold with probability no less than

1 - δ

:

\begin{matrix} | x (t) - \hat{x} (t) | \leq f_{k} (t) : = \frac{E_{I} + L_{w_{k}} w_{m_{k}}}{L_{x_{k}}} (e^{L_{x_{k}} t} - 1) \end{matrix}

(21a)

\begin{matrix} {\hat{V}}_{k} (x) \leq κ | x - \hat{x} |^{2} + \frac{{\hat{c}}_{4_{k}} \sqrt{{\hat{ρ}}_{k}}}{\sqrt{{\hat{c}}_{1_{k}}}} | x - \hat{x} | + {\hat{V}}_{k} (\hat{x}) \end{matrix}

(21b)

where

E_{I}

denotes an upper bound for the model mismatch between the initial RNN model of Equation (19) and the system of Equation (1) without disturbances (i.e.,

| F_{{n n}_{k}} (x, u_{k})) - F_{k} (x, u_{k}, 0) | \leq E_{I}

). The formulation of

E_{I}

can be derived using the generalized error bound for offline-trained RNNs (see [38] for details).

Remark 2.

Since the initial RNN can capture the nominal system dynamics only, Equation (21a) is derived by taking

w_{k} = w_{m_{k}}

(i.e., the worst-case scenario) into consideration. However, in this work, RNNs are iteratively updated using real-time data to capture non-linear dynamics of Equation (1) subject to bounded disturbances, such that the modeling error between the online learning RNN models of Equation (19) and the system of Equation (1) is bounded by the modeling error bound

E_{O}

with probability no less than

1 - δ

, i.e.,

| F_{{n n}_{k}} (x, u_{k}) - F_{k} (x, u_{k}, w_{k}) | \leq E_{O}

. Based on the generalized error bound

E_{P}

for RNNs updated online with non-i.i.d. training set (i.e.,

E_{P}

is given by the RHS of Equation (17)), the finite difference method can be used to approximate the modeling error bound

E_{O}

. Note that the inequality

| x - \hat{x} | \leq \sqrt{E_{P}}

holds with probability no less than

1 - δ

if the MSE loss function is utilized in this work. Similarly to the derivation of Equation (21a), the following inequality holds with probability no less than

1 - δ

:

| x (t) - \hat{x} (t) | \leq f_{k} (t) : = \frac{E_{O}}{L_{x_{k}}} (e^{L_{x_{k}} t} - 1)

(22)

In contrast to Equation (21a), it is readily seen from Equation (22) that if the RNNs are updated well, such that the inequality

E_{O} \leq L_{w_{k}} w_{m_{k}} + E_{I}

is satisfied, the state error

| x (t) - \hat{x} (t) |

achieved by the online learning RNN is smaller compared to that of the initial RNN trained offline.

Proposition 2 below demonstrates that if the initial RNN is trained to model the nominal system well (i.e.,

| F_{{n n}_{k}} (x, u_{k})) - F_{k} (x, u_{k}, 0) |

is sufficiently small), the closed-loop state of Equation (1) can be driven towards the origin and bounded in the stability region

Ω_{{\hat{ρ}}_{k}}

at all times under

u_{k} = Φ_{{n n}_{k}} (x) \in U_{k}

applied to the system of Equation (1) in a sample-and-hold fashion.

Proposition 2

([40]). Consider Equation (1) operating in mode k with

t_{k}^{o u t} = \infty

,

| w_{k} | \leq w_{m_{k}}

, under

u_{k} = Φ_{{n n}_{k}} (x) \in U_{k}

that is applied in a sample-and-hold fashion and meets the conditions of Equation (20). If the modeling error between the initial RNN model and the system of Equation (1) without disturbances can be bounded by

| F_{{n n}_{k}} (x, u_{k}) - F_{k} (x, u_{k}, 0) | \leq E_{I} \leq γ_{k} | x |

, and there exist

0 < ρ_{s_{k}} < {\hat{ρ}}_{k}

,

Δ > 0

, and

ϵ_{k} > 0

,

k \in ψ

, such that the following inequality is satisfied:

L_{x_{k}}^{'} M_{k} Δ + L_{w_{k}}^{'} w_{m_{k}} - \frac{{\tilde{c}}_{3_{k}}}{{\hat{c}}_{2_{k}}} ρ_{s_{k}} \leq - ϵ_{k}

(23)

where

{\tilde{c}}_{3_{k}} = {\hat{c}}_{3_{k}} - {\hat{c}}_{4_{k}} γ_{k} > 0

for

γ_{k}

satisfying

0 < γ_{k} < {\hat{c}}_{3_{k}} / {\hat{c}}_{4_{k}}

,

k \in ψ

, then, with probability no less than

1 - δ

, the following inequality holds for

t \in [t_{q}, t_{q + 1})

and

\forall x (t_{q}) \in Ω_{{\hat{ρ}}_{k}} \ Ω_{ρ_{s_{k}}}

:

{\hat{V}}_{k} (x (t)) \leq {\hat{V}}_{k} (x (t_{q}))

(24)

The following proposition ensures that under

u_{k} = Φ_{{n n}_{k}} (x) \in U_{k}

, the closed-loop state can be driven to the stability region of mode f when the system of Equation (1) is switched to the subsequent mode f from the current mode k at the prescribed switching time.

Proposition 3

([34]). Consider Equation (1) operating in mode k for

t \in [t_{k}^{i n}, t_{k}^{o u t})

, with

| w_{k} | \leq w_{m_{k}}

, and under

u_{k} = Φ_{{n n}_{k}} (x) \in U_{k}

satisfying the conditions in Proposition 1 and Proposition 2. Given

t_{k}^{i n} \leq t < t_{k}^{o u t} = t_{f}^{i n}

and

x (t_{k}^{i n}) \in Ω_{{\hat{ρ}}_{k}}

for some

f, k \in ψ

, if there exist positive real numbers

{\hat{ρ}}_{k}

, Δ,

ϵ_{k}

, and

N_{k}

, such that

{\hat{c}}_{2_{f}} {(|x_{s_{k}} - x_{s_{f}}| + \sqrt{\frac{{\hat{ρ}}_{k} - ϵ_{k} N_{k} Δ}{{\hat{c}}_{1_{k}}}})}^{2} \leq {\hat{ρ}}_{f}

(25)

then

x (t_{f}^{i n}) \in Ω_{{\hat{ρ}}_{f}}

.

The RNN-LEMPC scheme that optimizes economic benefits while maintaining closed-loop stability for Equation (1) is represented by the optimization problem as follows:

\begin{matrix} J = & \max_{u_{k} \in S (Δ)} \int_{t_{q}}^{t_{k}^{o u t}} l_{e} (\tilde{x} (t), u_{k} (t)) d t \end{matrix}

(26a)

\begin{matrix} s . t . & \dot{\tilde{x}} (t) = F_{{n n}_{k}} (\tilde{x} (t), u_{k} (t)) \end{matrix}

(26b)

\begin{matrix} \tilde{x} (t_{q}) = x (t_{q}) \end{matrix}

(26c)

\begin{matrix} u_{k} (t) \in U_{k}, \forall t \in [t_{q}, t_{k}^{o u t}) \end{matrix}

(26d)

\begin{matrix} {\hat{V}}_{k} (\tilde{x} (t)) \leq {\hat{ρ}}_{e_{k}}, \forall t \in [t_{q}, t_{k}^{o u t}), i f x (t_{q}) \in Ω_{{\hat{ρ}}_{e_{k}}} \end{matrix}

(26e)

\begin{matrix} {\dot{\hat{V}}}_{k} (x (t_{q}), u_{k}) \leq {\dot{\hat{V}}}_{k} (x (t_{q}), Φ_{{n n}_{k}} (x (t_{q}))), if x (t_{q}) \in Ω_{{\hat{ρ}}_{k}} \ Ω_{{\hat{ρ}}_{e_{k}}} \end{matrix}

(26f)

\begin{matrix} {\hat{V}}_{f} (\tilde{x} (t_{k}^{o u t})) + f_{e} (E_{P}) \leq {\hat{ρ}}_{f} \end{matrix}

(26g)

where

\tilde{x} (t)

and

S (Δ)

represent the predicted state trajectory and the class of piecewise constant functions with sampling period

Δ

, and

f_{e} (E_{P}) : = \frac{{\hat{c}}_{4_{k}} \sqrt{{\hat{ρ}}_{k}}}{\sqrt{{\hat{c}}_{1_{k}}}} \sqrt{E_{P}} + κ E_{P}

is used to evaluate the impact of

E_{P}

(i.e.,

| x - \hat{x} | \leq \sqrt{E_{P}}

) on the Lyapunov function value based on Equation (21b). At the current sampling time step

t_{q}

, the optimization problem of Equation (26) is solved by maximizing the economic cost function of Equation (26a) over a shrinking prediction horizon for

t \in [t_{q}

,

t_{k}^{o u t}

) and taking into account the constraints of Equations (26b)–(26g). In detail, Equation (26b) represents the prediction model that utilizes the initial RNN at the beginning and then is iteratively updated using real-time data. This prediction model is utilized to forecast future states

\tilde{x} (t)

for

t \in [t_{q}, t_{k}^{o u t})

given

\tilde{x} (t_{q})

obtained from the state measurement

x (t_{q})

in Equation (26c). The constraint of Equation (26d) is incorporated to bound the control inputs

u (t)

for

t \in [t_{q}, t_{k}^{o u t})

. Additionally, by designing

Ω_{{\hat{ρ}}_{e_{k}}}

as a subset of

Ω_{{\hat{ρ}}_{k}}

(i.e.,

{\hat{ρ}}_{e_{k}} < {\hat{ρ}}_{k}

), the constraints of Equations (26e) and (26f) are designed to ensure that the predicted state

\tilde{x} (t)

moves toward

Ω_{{\hat{ρ}}_{e_{k}}}

and remains inside

Ω_{{\hat{ρ}}_{k}}

at all times, and it will be demonstrated in Theorem 2 that the actual state

x (t)

of Equation (1) is maintained in

Ω_{{\hat{ρ}}_{k}}

. Finally, Equation (26g) is the mode transition constraint used to drive the state

x (t)

to

Ω_{{\hat{ρ}}_{f}}

at

t = t_{k}^{o u t}

.

It should be pointed out that when the prediction model (i.e., Equation (26b)) is updated online, all the terms in the RNN-LEMPC of Equation (26) associated with the Lyapunov function (i.e., Equation (26e), the LHS of Equation (26f), and Equation (26g)) use the latest RNN model except that the RHS of Equation (26f) utilizes the initial RNN at all times. Since the results of Propositions 1–3 are established under

u_{k} = Φ_{{n n}_{k}} (x) \in U_{k}

constructed using the initial RNN, the constraints of Equations (26e)–(26g) under

u_{k} = Φ_{{n n}_{k}} (x) \in U_{k}

may not hold due to the model inconsistency. Therefore, Equation (26) may not be guaranteed to be feasible once the prediction model is updated. To remedy this, the controller

Φ_{{n n}_{k}} (x)

will be utilized for the next sampling period to stabilize the system of Equation (1) as a backup controller in the case of infeasibility of Equation (26) for some sampling steps. Finally, we develop the following theorem to guarantee that the closed-loop state of Equation (1) is maintained in

Ω_{{\hat{ρ}}_{k}}

at all times and is driven into

Ω_{{\hat{ρ}}_{f}}

at the switching moment under the RNN-LEMPC of Equation (26) with updating RNN models.

Theorem 2.

Consider the system of Equation (1) with

| w_{k} | \leq w_{m_{k}}

under the RNN-LEMPC of Equation (26) using

Φ_{{n n}_{k}} (x)

as the backup controller when Equation (26) is infeasible. Let

0 < ρ_{s_{k}} < {\hat{ρ}}_{e_{k}} < {\hat{ρ}}_{k}

,

Δ > 0

,

ϵ_{k} > 0

,

k \in ψ

, satisfy

{\hat{ρ}}_{e_{k}} \leq - κ {(f_{k} (Δ))}^{2} - \frac{{\hat{c}}_{4_{k}} \sqrt{{\hat{ρ}}_{k}}}{\sqrt{{\hat{c}}_{1_{k}}}} f_{k} (Δ) + {\hat{ρ}}_{k}

(27)

where

f_{k} (\cdot)

is defined in Equation (21a) for the initial RNN and Equation (22) for online updating RNNs, respectively. For some

k, f \in ψ

, if

x (t_{k}^{i n}) \in Ω_{{\hat{ρ}}_{k}}

and all the conditions in Propositions 1–3 are satisfied, and the online updating RNNs are developed, such that

| F_{{n n}_{k}} (x, u_{k}) - F_{k} (x, u_{k}, w_{k}) | \leq E_{O} \leq γ_{k} | x |

(i.e., the modeling error constraint) is met, then for each sampling step, with probability no less than

1 - δ

, the state

x (t)

of Equation (1) is bounded in

Ω_{{\hat{ρ}}_{k}}

for

t \in [t_{k}^{i n}, t_{k}^{o u t})

and is driven to

Ω_{{\hat{ρ}}_{f}}

at

t = t_{k}^{o u t} = t_{f}^{i n}

.

Proof.

The proof consists of two parts. We first consider the case where the optimization problem of Equation (26) is infeasible and the control law

u_{k} = Φ_{{n n}_{k}} (x) \in U_{k}

is applied. In this case, it is demonstrated in Propositions 1–3 that the controller

u_{k} = Φ_{{n n}_{k}} (x) \in U_{k}

is able to guarantee the boundedness of the state

x (t)

within

Ω_{{\hat{ρ}}_{k}}

and the success of the scheduled model transitions for the non-linear system of Equation (1).

Subsequently, we prove that when there is a feasible solution

u_{k}^{⋆} (x (t_{q}))

(i.e., the optimal control action) for the RNN-LEMPC of Equation (26) with online updating RNN models, closed-loop stability for Equation (1) holds as well under

u_{k}^{⋆} (x (t_{q}))

. In detail, if

x (t_{q}) \in Ω_{{\hat{ρ}}_{e_{k}}}

, the predicted state

\tilde{x} (t)

stays in

Ω_{{\hat{ρ}}_{e_{k}}}

following the constraint of Equation (26e). Then, it follows from Proposition 1 that with probability no less than

1 - δ

, the actual state

x (t)

of Equation (1) for

t \in [t_{q}

,

t_{q + 1}

) can be bounded as follows:

\begin{matrix} {\hat{V}}_{k} (x) & \leq κ | x - \tilde{x} |^{2} + \frac{{\hat{c}}_{4_{k}} \sqrt{{\hat{ρ}}_{k}}}{\sqrt{{\hat{c}}_{1_{k}}}} | x - \tilde{x} | + {\hat{V}}_{k} (\tilde{x}) \\ \leq κ {(f_{k} (Δ))}^{2} + \frac{{\hat{c}}_{4_{k}} \sqrt{{\hat{ρ}}_{k}}}{\sqrt{{\hat{c}}_{1_{k}}}} f_{k} (Δ) + {\hat{V}}_{k} (\tilde{x}) \end{matrix}

(28)

Since

{\hat{V}}_{k} (\tilde{x}) \leq {\hat{ρ}}_{e_{k}}

, we obtain

{\hat{V}}_{k} (x) \leq {\hat{ρ}}_{k}

if Equation (27) holds, indicating that

x (t) \in Ω_{{\hat{ρ}}_{k}}

for all

t \in [t_{q}

,

t_{q + 1}

) with probability no less than

1 - δ

. Following the proof technique in [40], if

x (t_{q}) \in Ω_{{\hat{ρ}}_{k}} \ Ω_{{\hat{ρ}}_{e_{k}}}

, the time derivative of

\hat{V} (x (t))

of Equation (1) for

t \in [t_{q}, t_{q + 1})

can be bounded as

\dot{\hat{V}} (x (t)) \leq L_{x_{k}}^{'} M_{k} Δ + \dot{\hat{V}} (x (t_{q}))

using Equation (2). Note that for any

x (t_{q}) \in Ω_{{\hat{ρ}}_{k}} \ Ω_{{\hat{ρ}}_{e_{k}}}

, Equation (26f) is activated such that we can further bound

\dot{\hat{V}} (x (t_{q}))

with probability no less than

1 - δ

as follows:

\begin{matrix} \dot{\hat{V}} (x (t_{q})) = & \frac{\partial {\hat{V}}_{k} (x (t_{q}))}{\partial x} (F_{k} (x (t_{q}), u_{k}^{⋆} (x (t_{q})), w_{k}) - F_{{n n}_{k}} (x (t_{q}), u_{k}^{⋆} (x (t_{q})))) \\ + \frac{\partial {\hat{V}}_{k} (x (t_{q}))}{\partial x} F_{{n n}_{k}} (x (t_{q}), u_{k}^{⋆} (x (t_{q}))) \\ \leq & {\hat{c}}_{4_{k}} | x (t_{q}) | \cdot |F_{{n n}_{k}} (x (t_{q}), u_{k}^{⋆} (x (t_{q}))) - F_{k} (x (t_{q}), u_{k}^{⋆} (x (t_{q})), w_{k})| \\ + \frac{\partial {\hat{V}}_{k} (x (t_{q}))}{\partial x} F_{{n n}_{k}} (x (t_{q}), Φ_{{n n}_{k}} (x (t_{q}))) \\ \leq & {\hat{c}}_{4_{k}} γ_{k} | x (t_{q}) |^{2} - {\hat{c}}_{3_{k}} {| x (t_{q}) |}^{2} \\ \leq & - \frac{{\tilde{c}}_{3_{k}}}{{\hat{c}}_{2_{k}}} {\hat{ρ}}_{e_{k}} \end{matrix}

(29)

where the first inequality of Equation (29) is obtained under the constraints of Equations (26f) and (20c). The second inequality of Equation (29) follows from Equation (20b) and the inequality

| F_{{n n}_{k}} (x, u_{k}) - F_{k} (x, u_{k}, w_{k}) | \leq E_{O} \leq γ_{k} | x |

for online updating RNNs. Using Equation (20a) for any state

x (t_{q}) \in Ω_{{\hat{ρ}}_{k}} \ Ω_{{\hat{ρ}}_{e_{k}}}

, it follows that the last inequality of Equation (29) holds. Therefore, with probability no less than

1 - δ

, the following inequality for

\dot{\hat{V}} (x (t))

holds:

\begin{matrix} \dot{\hat{V}} (x (t)) & \leq L_{x_{k}}^{'} M_{k} Δ - \frac{{\tilde{c}}_{3_{k}}}{{\hat{c}}_{2_{k}}} {\hat{ρ}}_{e_{k}} \\ < L_{x_{k}}^{'} M_{k} Δ + L_{w_{k}}^{'} w_{m_{k}} - \frac{{\tilde{c}}_{3_{k}}}{{\hat{c}}_{2_{k}}} ρ_{s_{k}} \end{matrix}

(30)

Due to

0 < ρ_{s_{k}} < {\hat{ρ}}_{e_{k}}

and

0 < L_{w_{k}}^{'} w_{m_{k}}

, the second inequality of Equation (30) is derived. Therefore, it is demonstrated in Equation (30) that

\dot{\hat{V}} (x (t)) < 0

holds if the constraint of Equation (23) in Proposition 2 is met. This implies that for any

x (t_{q}) \in Ω_{{\hat{ρ}}_{k}} \ Ω_{{\hat{ρ}}_{e_{k}}}

, the value of

{\hat{V}}_{k} (x)

decreases for

t \in [t_{q}, t_{q + 1})

with probability no less than

1 - δ

under

u_{k}^{⋆} (x (t_{q}))

, and, thus, the closed-loop state of Equation (1) can enter into

Ω_{{\hat{ρ}}_{e_{k}}}

within finite sampling steps for a certain probability. Additionally, using Equation (21b) and

| x - \hat{x} | \leq \sqrt{E_{P}}

, the value of

{\hat{V}}_{f} (x (t_{k}^{o u t}))

can be bounded with probability no less than

1 - δ

as follows:

\begin{matrix} {\hat{V}}_{f} (x (t_{k}^{o u t})) \leq κ E_{P} + \frac{{\hat{c}}_{4_{k}} \sqrt{{\hat{ρ}}_{k}}}{\sqrt{{\hat{c}}_{1_{k}}}} \sqrt{E_{P}} + {\hat{V}}_{f} (\tilde{x} (t_{k}^{o u t})) \end{matrix}

(31)

According to Equation (31), we have

{\hat{V}}_{f} (x (t_{k}^{o u t})) \leq {\hat{ρ}}_{f}

if Equation (26g) is met, which indicates that at

t = t_{k}^{o u t}

, the closed-loop state

x (t)

can be driven into

Ω_{{\hat{ρ}}_{f}}

in probability.

Therefore, closed-loop stability can be achieved for the system of Equation (1) in probability regardless of the feasibility of Equation (26). This completes the proof of Theorem 2. □

Remark 3.

The RNN-LEMPC of Equation (26) demonstrates that if the state measurement

x (t_{q})

at

t = t_{q}

is in the region

Ω_{{\hat{ρ}}_{e_{k}}}

, the economic cost function is maximized within

Ω_{{\hat{ρ}}_{e_{k}}}

; if

x (t_{q}) \in Ω_{{\hat{ρ}}_{k}} \ Ω_{{\hat{ρ}}_{e_{k}}}

, the predicted state

\tilde{x} (t)

is driven towards

Ω_{{\hat{ρ}}_{e_{k}}}

. Additionally, it has been proven in Theorem 2 that the actual state

x (t)

of Equation (1) is bounded in the stability region

Ω_{{\hat{ρ}}_{k}}

if

\tilde{x} (t)

is maintained in

Ω_{{\hat{ρ}}_{e_{k}}}

. Therefore, the region

Ω_{{\hat{ρ}}_{e_{k}}}

is a “safe" operating region in which the RNN-LEMPC of Equation (26) can maximize economic benefits while maintaining the boundedness of the state

x (t)

within

Ω_{{\hat{ρ}}_{k}}

. It is noted from Equation (27) that the relation between

Ω_{{\hat{ρ}}_{e_{k}}}

and

Ω_{{\hat{ρ}}_{k}}

is determined by

f_{k} (Δ)

, which is the upper bound on

| x (t) - \hat{x} (t) |

within one sampling period Δ. As discussed in Remark 2, online learning RNNs are capable of modeling Equation (1) involving disturbances while the initial RNN can capture the nominal system dynamics only. This implies that compared to the initial model, online learning RNNs may better approximate Equation (1) such that the state error

| x (t) - \hat{x} (t) |

is smaller, and, thus, a larger

{\hat{ρ}}_{e_{k}}

may be chosen for RNN-LEMPC with online updating RNNs. Therefore, while we use the controller

Φ_{{n n}_{k}} (x)

characterized using the initial offline-trained RNN as a backup controller to stabilize Equation (1) when RNN-LEMPC is infeasible, the online update of RNNs is performed to improve the closed-loop economic performance of Equation (1), which will be illustrated using a non-linear chemical process in the next section.

5. Application to a Chemical Process Example

In this section, a chemical process example is used to illustrate the efficacy of the proposed LEMPC scheme using RNN models updated online. Specifically, a non-isothermal continuous stirred tank reactor (CSTR) is considered, in which a reactant A is transformed into a product B (

A \to B

) via a second-order, irreversible, and exothermic reaction. The CSTR is required to switch between two modes consisting of two available inlet streams with different inlet concentrations

C_{A 0_{σ}}

and inlet temperatures

T_{0_{σ}}

for the pure reactant A, where

σ \in ψ = {1, 2}

. Additionally, a heating jacket with the heat rate Q is furnished in the CSTR to supply or remove heat for the reactor. At mode

σ \in {1, 2}

, the CSTR dynamic model is described by the following ordinary differential equations:

\begin{matrix} \frac{d C_{A}}{d t} = - k_{0} e^{\frac{- E}{R T}} C_{A}^{2} + \frac{F}{V} (C_{A 0_{σ}} - C_{A}) \\ \frac{d T}{d t} = \frac{- Δ H}{ρ_{L} C_{p}} k_{0} e^{\frac{- E}{R T}} C_{A}^{2} + \frac{F}{V} (T_{0_{σ}} - T) + \frac{Q}{ρ_{L} C_{p} V} \end{matrix}

(32)

where

C_{A}

is the concentration of the reactant A and T is the reactor temperature. A detailed description of the chemical reaction and the process parameters in Equation (32) can be found in [24]. The process parameter values of the CSTR used in the closed-loop simulations are given in Table 1.

Table 1. Parameter values of the CSTR.

For each mode, a steady-state

(C_{A s_{σ}}, T_{s_{σ}})

is considered for the CSTR under

(C_{A 0 s_{σ}} Q_{s})

(i.e., the steady-state input values). In this example, two manipulated inputs are the heat input rate Q and the inlet concentration

C_{A 0_{σ}}

, which are denoted by

Δ Q = Q - Q_{s}

and

Δ C_{A 0_{σ}} = C_{A 0_{σ}} - C_{A 0 s_{σ}}

in their deviation variable forms, respectively. The manipulated inputs are bounded by

| Δ Q | \leq 5 \times 10^{5}

kJ/h and

| Δ C_{A 0_{σ}} | \leq 3.5 {kmol / m}^{3}

for both modes. The input and state vectors in deviation form for the CSTR of Equation (32) are represented by

u^{T} = [Δ C_{A 0_{σ}} Δ Q]

and

x^{T} = [C_{A} - C_{A s_{σ}} T - T_{s_{σ}}]

, respectively, such that the equilibrium point of the CSTR for each mode is at the origin of the steady-space. It is desired to operate the CSTR in

Ω_{{\hat{ρ}}_{σ}}

(i.e., the stability region) around

(C_{A s_{σ}}, T_{s_{σ}})

while maximizing the production rate of B given by:

l_{e} (x, u) = k_{0} e^{- E / R T} C_{A}^{2}

(33)

The explicit Euler method with a sufficiently small integration time step of

{\bar{h}}_{c} = 10^{- 4} h

is applied to numerically solve the CSTR dynamic model of Equation (32). Additionally, the non-linear optimization problem of Equation (26) is solved by PyIpopt [41] with a sampling period of

Δ = 10^{- 2} h

.

5.1. The CSTR Switched between Two Modes with Bounded Disturbances

We first consider the CSTR subject to the following disturbances. (1) The upstream disturbance results in the variation of the feed flow rate F in a way that F is time varying and is subject to the constraint:

0 \leq F \leq 5.5 m^{3} / h

. (2) The catalyst deactivation is considered during process operation; this results in a gradual reduction in the pre-exponential factor

k_{0}

that is constrained by:

0 \leq k_{0} \leq 8.46 \times 10^{6} m^{3} / kmol h

. Additionally, the control Lyapunov functions for both modes are designed using the quadratic form of

V_{σ} (x) = x^{T} P_{σ} x

with

P_{σ} = [\begin{matrix} 1060 & 22 \\ 22 & 0.52 \end{matrix}]

for

σ \in {1, 2}

. As discussed in Section 2.3, we follow the development method of RNN models in [37] to construct two initial RNNs to model the nominal CSTR system (i.e., the values of F and

k_{0}

are taken in Table 1 at all times) operating in two modes using historical data gathered from the entire operating region, respectively. Specifically, the RNN models are trained using Keras, where a hidden recurrent layer of 16 neurons is utilized for both initial RNNs, with

t a n h

as the activation function, MSE as the loss function, and Adam as the optimizer. Based on the two initial RNNs, the stability region

Ω_{{\hat{ρ}}_{σ}}

and a subset

Ω_{{\hat{ρ}}_{e_{σ}}}

for the CSTR at mode

σ \in {1, 2}

can be characterized accordingly. In this example,

{\hat{ρ}}_{1}

and

{\hat{ρ}}_{e_{1}}

are chosen to be 368 and 280 for mode 1, and

{\hat{ρ}}_{2}

and

{\hat{ρ}}_{e_{2}}

are chosen to be 228 and 170, respectively. Since the CSTR involves disturbances during process operation, we follow the update strategy in [34] to improve RNN models online to capture the uncertain CSTR system involving bounded disturbances. Specifically, the online update of RNNs is carried out based on the most recent real-time data collected from a fixed time interval (e.g., five sampling periods) and the previous RNN model. The new RNN is utilized to predicate state evolution in LEMPC only if the modeling error constraint in Theorem 2 is met; otherwise, we will discard the new RNN model and use the previous RNN model as the prediction model in LEMPC.

We carry out the closed-loop simulations for the CSTR subject to bounded disturbances and with scheduled mode transitions under RNN-LEMPC as follows. Specifically, the CSTR operates under mode 1 for

t \in [0, 0.25

h) and the mode transition takes place at

t = 0.25

h, after which the CSTR operates under mode 2 for

t \in [0.25

h,

\infty)

. The value of F changes to 2.5 m

^{3}

/h and 5.5 m

^{3}

/h, and

k_{0}

reduces to

0.8 k_{0}

,

0.6 k_{0}

, at

t = 0.05

h and

0.25

h, respectively. Starting from an initial condition

(C_{A}, T)

= (1.95 kmol/m

^{3}

402 K), the simulation results for the uncertain CSTR system with process disturbances under the LEMPC with the initial offline-learning RNNs and the online updating RNNs are displayed in Figure 2, Figure 3 and Figure 4. In detail, it is observed from Figure 2a that under the LEMPC using the initial RNN at all times, the closed-loop state is bounded in

Ω_{{\hat{ρ}}_{1}}

and

Ω_{{\hat{ρ}}_{2}}

(i.e., the stability regions) for both modes, and is driven from the initial condition outside of

Ω_{{\hat{ρ}}_{2}}

into

Ω_{{\hat{ρ}}_{2}}

at the switching moment. However, the state trajectories under the initial RNNs show considerable oscillations near the boundaries of

Ω_{{\hat{ρ}}_{e_{1}}}

and

Ω_{{\hat{ρ}}_{e_{2}}}

for both modes, while those under the online updating RNNs stay smoothly at the boundaries of

Ω_{{\hat{ρ}}_{e_{1}}}

and

Ω_{{\hat{ρ}}_{e_{2}}}

with much smaller oscillations, as shown in Figure 2b. Additionally, Figure 3 shows the comparisons of the Lyapunov function value

\hat{V} (x)

under the LEMPC using the initial and online updating RNN models for both modes, respectively. It is shown in Figure 3 that both

{\hat{V}}_{1} (x)

and

{\hat{V}}_{2} (x)

under the initial RNNs show persistent oscillations around

{\hat{ρ}}_{e_{1}}

and

{\hat{ρ}}_{e_{2}}

, respectively, while those under the online updating RNNs show oscillations within finite sampling steps and ultimately converge to

{\hat{ρ}}_{e_{1}}

and

{\hat{ρ}}_{e_{2}}

after several rounds of online updates of RNNs, respectively. This implies that the contractive constraint of Equation (26f) under the LEMPC using the initial RNNs is activated frequently, since the Lyapunov function value

\hat{V} (x)

exceeds

{\hat{ρ}}_{e_{1}}

and

{\hat{ρ}}_{e_{2}}

for both modes frequently, while the contractive constraint remains inactive after finite sampling steps under the online updating RNNs. Figure 4 depicts the state profiles (i.e.,

C_{A}

and T) and the input profiles (i.e.,

C_{A_{0}}

and Q) in the original state space. Specifically, it is observed from Figure 4a that under the online updating RNNs, the LEMPC drives the states

C_{A}

and T to the optimal operating points that maximize the production rate of B for both modes. However, the state

C_{A}

exhibits sustained oscillations under the LEMPC using the initial RNNs. Similarly, it is shown in Figure 4b that the LEMPC using the online updating RNNs shows smoother manipulated input profiles (fewer oscillations) compared to that using the initial RNNs. The above simulation results demonstrate that the initial RNNs trained with historical data cannot predict well the uncertain CSTR system in the presence of process disturbances, which results in sustained oscillations in the state trajectories, the evolution of Lyapunov function value

\hat{V} (x)

, and the state and input profiles. These oscillations can be effectively mitigated after a more accurate RNN model that approximates the uncertain CSTR system dynamics is derived through an online update of RNNs.

Figure 2. Closed–loop state trajectories

(C_{A}, T)

for the uncertain CSTR system operating in mode 1 for

t \in [0, 0.25

h) (red solid line) and switching to mode 2 at

t = 0.25

h (blue solid line) under the RNN-LEMPC of Equation (26), (a) using the initial offline-learning RNNs at all times, and (b) using the online updating RNNs, for the initial condition

(C_{A}, T)

= (1.95 kmol/m

^{3}

402 K) (marked as red diamond).

Figure 3. Comparisons of

{\hat{V}}_{1} (x)

for mode 1 and

{\hat{V}}_{2} (x)

for mode 2 under the initial offline-learning and online updating RNN models.

Figure 4. (a) Closed-loop state (

C_{A}

and T) and (b) manipulated input (

C_{A_{0}}

and Q) profiles for the uncertain CSTR system operating in mode 1 for

t \in [0, 0.25

h) and switching to mode 2 at

t = 0.25

h under the RNN-LEMPC of Equation (26) using the initial offline-learning RNNs at all times (blue solid line), and using the online updating RNNs (red dashed line), for the initial condition

(C_{A}, T)

= (1.95 kmol/m

^{3}

402 K).

Finally, in the event that the system dynamics of the CSTR remains unchanged for the remaining operation time (i.e., no further mode transition and no process disturbances after

t = 0.45 h

), an online update of RNNs will be deactivated and the final RNN model can be derived by solving Equation (18), and the CSTR will operate in mode 2 under the LEMPC using the final RNN model after

t = 0.45 h

. Specifically, when the CSTR operates in mode 2, the RNNs are updated online at

t = 0.3, 0.35, 0.4, 0.45

for four times (the number of rounds

T = 4

). We use the hypothesis

h_{1}

and a sequence of hypotheses

h_{2}, \dots, h_{5}

to denote the initial RNN and the four online updating RNNs for mode 2, respectively. To simplify the calculation of the empirical discrepancy

{\hat{disc}}_{H} (λ)

of Equation (18), the hypothesis

\bar{h}

is considered to belong to a linear space (denoted by

\bar{H}

of the hypotheses

h_{2}, \dots, h_{5}

in this example, that is

\bar{h} \in \bar{H} : = \{\sum_{i = 1}^{4} β_{i} h_{i + 1}, where \sum_{i = 1}^{4} β_{i} = 1 and β_{i} \geq 0 \forall i = 1, \dots, 4\}

. It is noted that each round of online learning represents five sampling periods in this example, and, thus, the RNN input for the next round

X_{T + 1}

consists of the system states

x (t)

and the control inputs

u (t)

for the current and the next four sampling steps, where

t = 0.45, 0.46, 0.47, 0.48,

and

0.49 h

. Therefore, the loss between the RNN outputs predicted by the hypotheses

h_{t + 1}

and

\bar{h}

on

X_{T + 1}

(i.e.,

L (h_{t + 1} (X_{T + 1}), \bar{h} (X_{T + 1}))

,

t = 1, \dots, 4

) can be obtained. The optimization problem of Equation (18) can be simplified to the following minimax optimization problem:

\begin{matrix} min_{λ \in Ω_{T}} max_{\bar{h} \in \bar{H}} & |\sum_{t = 1}^{T} λ_{t} (L (h_{t + 1}, Z_{t + 1}) - L (h_{t + 1} (X_{T + 1}), \bar{h} (X_{T + 1})))| + \sum_{t = 1}^{T} λ_{t} L (h_{t + 1}, Z_{t + 1}) \\ s . t . λ_{T} = 0, \sum_{t = 1}^{T} | λ_{t} - \frac{1}{T} | \leq α \end{matrix}

(34)

It is noted that exhaustive searches for the hypothesis

\bar{h}

are performed in the linear space

\bar{H}

, thereby converting the minimax optimization problem of Equation (34) into the minimization problem of the maximum of a set of objective functions, which can be efficiently solved using the MATLAB routine fminimax. By setting the hyperparameter

α = 0.8

, the optimization problem of Equation (34) is solved to calculate the optimal weight vector

λ = (λ_{1} \dots λ_{4})

, yielding

λ_{1} = 0.1

,

λ_{2} = 0.25

,

λ_{3} = 0.65

, where the weight

λ_{4}

is assigned to be zero following the constraints in Equation (34). Subsequently, the final RNN model h is derived using the ensemble of the hypotheses

h_{2}, \dots, h_{4}

with the corresponding weights

λ_{1} \dots λ_{3}

, that is

h = \sum_{t = 1}^{3} λ_{t} h_{t + 1}

, and the hypothesis

h_{5}

is discarded due to its weight

λ_{4} = 0

. It should be pointed out that when the system dynamics of the CSTR varies over time caused by further mode transitions and/or process disturbances at some future operation time, the final model h needs to be updated online again using real-time data if it does not perform well.

5.2. The CSTR Switched between Two Steady-States without Disturbances

We next consider a special switching case of the nominal CSTR system, as discussed in Section 3.1, where the CSTR operates in mode 1 defined in Section 5.1 at all times (i.e.,

σ \equiv 1

for Equation (32)) and does not involve disturbances. In this case, two steady-states

(C_{A s_{1}}, T_{s_{1}})

= (1.95 kmol/m

^{3}

, 402 K) and

(C_{A s_{2}}, T_{s_{2}})

= (1.22 kmol/m

^{3}

, 438 K) for the CSTR are considered under

(C_{A 0_{s}} Q_{s})

= (4 kmol/m

^{3}

, 0 kJ/h). The CSTR is switched between two steady-states at the prescribed switching times while maximizing the economic costs of Equation (33) under RNN-LEMPC. The CSTR is said to operate in mode 1 or 2 when it operates in the stability region

Ω_{{\hat{ρ}}_{1}}

or

Ω_{{\hat{ρ}}_{2}}

around the steady-state

(C_{A s_{1}}, T_{s_{1}})

or

(C_{A s_{2}}, T_{s_{2}})

. For both modes, the control Lyapunov functions follow those in Section 5.1 with

P_{1} = P_{2} = [\begin{matrix} 1060 & 22 \\ 22 & 0.52 \end{matrix}]

. In this section, it is assumed that historical operational data is only available for the CSTR operating in a portion of operating region (denoted by

Ω_{0}

) around the steady-state

(C_{A s_{1}}, T_{s_{1}})

, where

Ω_{0} : = \{1.5 {kmol / m}^{3} \leq C_{A} \leq 2.4 {kmol / m}^{3} and 360 K \leq T \leq 440 K\}

(marked as a rectangle in Figure 5). Based on this limited training dataset, an initial RNN is trained using the same method in Section 5.1. The stability region

Ω_{{\hat{ρ}}_{1}}

and a subset

Ω_{{\hat{ρ}}_{e_{1}}}

for mode 1 follow those in Section 5.1 with

{\hat{ρ}}_{1} = 368

and

{\hat{ρ}}_{e_{1}} = 280

, and the stability region

Ω_{{\hat{ρ}}_{2}}

with

{\hat{ρ}}_{2} = 480

and a subset

Ω_{{\hat{ρ}}_{e_{2}}}

with

{\hat{ρ}}_{e_{2}} = 380

are chosen for the CSTR at mode 2 in this section. In this example, since the initial RNN is developed with the dataset in

Ω_{0}

around the steady-state

(C_{A s_{1}}, T_{s_{1}})

of mode 1, we will operate the CSTR in mode 1 under LEMPC with the initial RNN at all times. However, when the CSTR operates in mode 2, we will update the RNN models online since the initial RNN lacks the dataset around the steady-state

(C_{A s_{2}}, T_{s_{2}})

of mode 2. Specifically, starting from the initial RNN, the RNNs are updated online using the most recent real-time data (i.e., every five sampling periods for each round) and the previous RNN model.

Figure 5. Closed-loop state trajectories

(C_{A}, T)

for the nominal CSTR system operating in mode 1 for

t \in [0, 0.1 h)

using the initial offline-learning RNN (red solid line), and switching to mode 2 at

t = 0.1 h

using the initial RNN (blue solid line) and online updating RNNs (pink dashed line) under the RNN-LEMPC of Equation (26) with the initial condition

(C_{A}, T) = (1.95 {kmol / m}^{3} 402 K)

(marked as red diamond).

The simulation results for the nominal CSTR system switched between two steady-states under the RNN-LEMPC of Equation (26) are presented in Figure 5 and Figure 6. Specifically, the CSTR starts from an initial condition

(C_{A}, T) = (1.95 {kmol / m}^{3} 402 K)

and operates in mode 1 for

t \in [0, 0.1 h)

under the LEMPC with the initial RNN constructed with the dataset in

Ω_{0}

. Subsequently, the CSTR operates under mode 2 for

t \in [0.1 h, \infty)

(i.e., the remaining operation time) following a switching schedule from mode 1 to mode 2 at

t = 0.1 h

under LEMPC using the initial RNN and the online updating RNNs. Figure 5 shows that under the initial RNN, the state trajectory closely follows the boundary of

Ω_{{\hat{ρ}}_{e_{1}}}

for mode 1. This is consistent with the result in Figure 6a, which shows that the Lyapunov function value

{\hat{V}}_{1} (x)

under the initial RNN converges to

{\hat{ρ}}_{e_{1}}

after

t = 0.04 h

. It should be mentioned that the initial RNN is constructed with the dataset gathered from a portion of the operating region around

(C_{A s_{1}}, T_{s_{1}})

, and, thus, the LEMPC with the initial RNN performs well for the CSTR operating in mode 1. Additionally, it is observed in Figure 5 that under the initial RNN, there exists a gap between the closed-loop state trajectory and the boundary of

Ω_{{\hat{ρ}}_{e_{2}}}

for mode 2, while the closed-loop state trajectory under the online updating RNNs ultimately operates near the boundary of

Ω_{{\hat{ρ}}_{e_{2}}}

after exhibiting oscillations for some sampling steps. It is more apparent in Figure 6b that the Lyapunov function value

{\hat{V}}_{2} (x)

under the initial RNN converges to a value of 340, while it converges to

{\hat{ρ}}_{e_{2}} = 380

after

t = 0.26 h

under the online updating RNNs. The total economic benefits

L_{E} = \int_{0.15}^{0.4} l_{e} (x, u) d t

within the operating period

t \in [0.15 h, 0.4 h)

are calculated (note that the first update of RNNs occurs at

t = 0.15 h

), which yields 4.75 and 4.84 for the LEMPC with the initial RNN and the online RNNs, respectively. Therefore, the accumulative economic benefits during

t \in [0.15 h, 0.4 h)

are improved by

1.9 %

via online learning. Finally, the RNNs are updated online at

t = 0.15, 0.2, 0.25, 0.3, 0.35, 0.4 h

six times (

T = 6

) to generate a sequence of hypotheses

h_{2}, \dots, h_{7}

. Based on the testing error

L (h_{t}, Z_{t})

for each hypothesis

h_{t}

, Equation (8) is solved with the hyperparameter

α = 0.6

to calculate the optimal weights

λ_{1} \dots λ_{6}

for the hypotheses

h_{1}, \dots, h_{6}

, respectively, yielding

λ_{1} = 0

,

λ_{2} = 0.0333

,

λ_{3} = 0.1666

,

λ_{4} = 0.1667

,

λ_{5} = 0.4667

, and

λ_{6} = 0.1667

. Subsequently, the final RNN model h is developed with

h = \sum_{t = 1}^{6} λ_{t} h_{t}

, and the CSTR operates in mode 2 under the LEMPC with this final RNN model for the remaining operation time. However, when there is a further mode transition for the CSTR, the online update of RNNs is required to perform again if the final model does not predict well for the CSTR operating in the new mode.

Figure 6. Comparisons of

{\hat{V}}_{1} (x)

for mode 1 using the initial offline-learning RNN (red dashed line), and

{\hat{V}}_{2} (x)

for mode 2 using the initial RNN (blue solid line) and online updating RNNs (pink dashed line).

6. Conclusions

This work proposed an LEMPC scheme using online updating RNNs that can optimize the economic benefits of switched non-linear systems. The generalized error bounds for RNN models updated online in i.i.d. and non-i.i.d. settings were derived, respectively. Subsequently, the LEMPC that incorporates online learning RNNs was developed to maintain the closed-loop state within the prescribed stability region and maximize the economic benefits for the uncertain system involving bounded disturbances. A Lyapunov-based constraint was incorporated into the LEMPC formulation to ensure the success of scheduled mode transitions. Closed-loop stability for the uncertain non-linear system subject to bounded disturbances under LEMPC was proved in a probabilistic manner accounting for the generalized error bound. The proposed LEMPC scheme was applied to a chemical process example to demonstrate that economic optimality and closed-loop stability can be improved under the LEMPC using online RNNs compared to those using the initial RNNs at all times.

Author Contributions

C.H. developed the main results, performed the simulation studies and prepared the initial draft of the paper. S.C. contributed to the simulation studies in this manuscript. Z.W. developed the idea of RNN generalized error, oversaw all aspects of the research and revised this manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National University of Singapore Start-up Grant, Grant/Award Number: R279-000-656-731 and MOE AcRF Tier 1 FRC Grant, Grant/Award Number: CHBE-22-5367.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Angeli, D.; Amrit, R.; Rawlings, J.B. On average performance and stability of economic model predictive control. IEEE Trans. Autom. Control 2011, 57, 1615–1626. [Google Scholar] [CrossRef]
Heidarinejad, M.; Liu, J.; Christofides, P.D. Economic model predictive control of nonlinear process systems using Lyapunov techniques. AIChE J. 2012, 58, 855–870. [Google Scholar] [CrossRef]
Müller, M.A.; Angeli, D.; Allgöwer, F. Economic model predictive control with self-tuning terminal cost. Eur. J. Control 2013, 19, 408–416. [Google Scholar] [CrossRef]
Ellis, M.; Durand, H.; Christofides, P.D. A tutorial review of economic model predictive control methods. J. Process Control 2014, 24, 1156–1178. [Google Scholar] [CrossRef]
Dong, Z.; Angeli, D. Analysis of economic model predictive control with terminal penalty functions on generalized optimal regimes of operation. Int. J. Robust Nonlinear Control 2018, 28, 4790–4815. [Google Scholar] [CrossRef]
Dong, Z.; Angeli, D. Homothetic tube-based robust economic mpc with integrated moving horizon estimation. IEEE Trans. Autom. Control 2020, 66, 64–75. [Google Scholar] [CrossRef]
Lee, T.C.; Jiang, Z.P. Uniform asymptotic stability of nonlinear switched systems with an application to mobile robots. IEEE Trans. Autom. Control 2008, 53, 1235–1252. [Google Scholar] [CrossRef]
Shen, H.; Xing, M.; Wu, Z.G.; Xu, S.; Cao, J. Multiobjective fault-tolerant control for fuzzy switched systems with persistent dwell time and its application in electric circuits. IEEE Trans. Fuzzy Syst. 2019, 28, 2335–2347. [Google Scholar] [CrossRef]
Jin, Y.; Fu, J.; Zhang, Y.; Jing, Y. Reliable control of a class of switched cascade nonlinear systems with its application to flight control. Nonlinear Anal. Hybrid Syst. 2014, 11, 11–21. [Google Scholar] [CrossRef]
Branicky, M.S. Multiple Lyapunov functions and other analysis tools for switched and hybrid systems. IEEE Trans. Autom. Control 1998, 43, 475–482. [Google Scholar] [CrossRef]
Aleksandrov, A.Y.; Chen, Y.; Platonov, A.V.; Zhang, L. Stability analysis for a class of switched nonlinear systems. Automatica 2011, 47, 2286–2291. [Google Scholar] [CrossRef]
Hespanha, J.P.; Morse, A.S. Stability of switched systems with average dwell-time. In Proceedings of the 38th IEEE Conference on Decision and Control, Phoenix, AZ, USA, 7–10 December 1999; Volume 3, pp. 2655–2660. [Google Scholar]
Xiang, W.; Xiao, J. Stabilization of switched continuous-time systems with all modes unstable via dwell time switching. Automatica 2014, 50, 940–945. [Google Scholar] [CrossRef]
Nodozi, I.; Rahmani, M. LMI-based model predictive control for switched nonlinear systems. J. Process Control 2017, 59, 49–58. [Google Scholar] [CrossRef]
Mhaskar, P.; El-Farra, N.H.; Christofides, P.D. Predictive control of switched nonlinear systems with scheduled mode transitions. IEEE Trans. Autom. Control 2005, 50, 1670–1680. [Google Scholar] [CrossRef]
Heidarinejad, M.; Liu, J.; Christofides, P.D. Economic model predictive control of switched nonlinear systems. Syst. Control Lett. 2013, 62, 77–84. [Google Scholar] [CrossRef]
Prabhu, S.; Deepa, S.; Arulperumjothi, M.; Susilowati, L.; Liu, J. Resolving-power domination number of probabilistic neural networks. J. Intell. Fuzzy Syst. 2022, 43, 6253–6263. [Google Scholar] [CrossRef]
Zheng, Y.; Zhang, T.; Li, S.; Qi, C.; Zhang, Y.; Wang, Y. Data-Driven Distributed Model Predictive Control of Continuous Nonlinear Systems with Gaussian Process. Ind. Eng. Chem. Res. 2022, 61, 18187–18202. [Google Scholar] [CrossRef]
Zhang, T.; Li, S.; Zheng, Y. Implementable Stability Guaranteed Lyapunov-Based Data-Driven Model Predictive Control with Evolving Gaussian Process. Ind. Eng. Chem. Res. 2022, 61, 14681–14690. [Google Scholar] [CrossRef]
Pan, Y.; Wang, J. Model predictive control of unknown nonlinear dynamical systems based on recurrent neural networks. IEEE Trans. Ind. Electron. 2011, 59, 3089–3101. [Google Scholar] [CrossRef]
Xu, J.; Li, C.; He, X.; Huang, T. Recurrent neural network for solving model predictive control problem in application of four-tank benchmark. Neurocomputing 2016, 190, 172–178. [Google Scholar] [CrossRef]
Shahnazari, H.; Mhaskar, P.; House, J.M.; Salsbury, T.I. Modeling and fault diagnosis design for HVAC systems using recurrent neural networks. Comput. Chem. Eng. 2019, 126, 189–203. [Google Scholar] [CrossRef]
Wu, Z.; Christofides, P.D. Economic machine-learning-based predictive control of nonlinear systems. Mathematics 2019, 7, 494. [Google Scholar] [CrossRef]
Wu, Z.; Rincon, D.; Christofides, P.D. Real-time adaptive machine-learning-based predictive control of nonlinear processes. Ind. Eng. Chem. Res. 2019, 59, 2275–2290. [Google Scholar] [CrossRef]
Wagener, N.; Cheng, C.A.; Sacks, J.; Boots, B. An online learning approach to model predictive control. arXiv 2019, arXiv:1902.08967. [Google Scholar]
Bieker, K.; Peitz, S.; Brunton, S.L.; Kutz, J.N.; Dellnitz, M. Deep model predictive control with online learning for complex physical systems. arXiv 2019, arXiv:1905.10094. [Google Scholar]
Ning, C.; You, F. Online learning based risk-averse stochastic MPC of constrained linear uncertain systems. Automatica 2021, 125, 109402. [Google Scholar] [CrossRef]
Zheng, Y.; Zhao, T.; Wang, X.; Wu, Z. Online Learning-Based Predictive Control of Crystallization Processes under Batch-to-Batch Parametric Drift. AIChE J. 2022, 68, e17815. [Google Scholar] [CrossRef]
Cesa-Bianchi, N.; Conconi, A.; Gentile, C. On the generalization ability of on-line learning algorithms. IEEE Trans. Inf. Theory 2004, 50, 2050–2057. [Google Scholar] [CrossRef]
Cesa-Bianchi, N.; Gentile, C. Improved risk tail bounds for on-line algorithms. IEEE Trans. Inf. Theory 2008, 54, 386–390. [Google Scholar] [CrossRef]
Kakade, S.M.; Tewari, A. On the generalization ability of online strongly convex programming algorithms. Adv. Neural Inf. Process. Syst. 2008, 21, 801–808. [Google Scholar]
Rakhlin, A.; Sridharan, K.; Tewari, A. Online learning via sequential complexities. J. Mach. Learn. Res. 2015, 16, 155–186. [Google Scholar]
Kuznetsov, V.; Mohri, M. Time series prediction and online learning. In Proceedings of the 29th Annual Conference on Learning Theory, New York, NY, USA, 23–26 June 2016; pp. 1190–1213. [Google Scholar]
Hu, C.; Cao, Y.; Wu, Z. Online Machine Learning Modeling and Predictive Control of Nonlinear Systems with Scheduled Mode Transitions. AIChE J. 2022, 69, e17882. [Google Scholar] [CrossRef]
Hu, C.; Wu, Z. Model Predictive Control of Switched Nonlinear Systems Using Online Machine Learning. submitted.
Lin, Y.; Sontag, E.D. A universal formula for stabilization with bounded controls. Syst. Control Lett. 1991, 16, 393–397. [Google Scholar] [CrossRef]
Wu, Z.; Tran, A.; Rincon, D.; Christofides, P.D. Machine learning-based predictive control of nonlinear processes. Part I: Theory. AIChE J. 2019, 65, e16729. [Google Scholar] [CrossRef]
Wu, Z.; Rincon, D.; Gu, Q.; Christofides, P.D. Statistical Machine Learning in Model Predictive Control of Nonlinear Processes. Mathematics 2021, 9, 1912. [Google Scholar] [CrossRef]
Kuznetsov, V.; Mohri, M. Discrepancy-based theory and algorithms for forecasting non-stationary time series. Ann. Math. Artif. Intell. 2020, 88, 367–399. [Google Scholar] [CrossRef]
Wu, Z.; Alnajdi, A.; Gu, Q.; Christofides, P.D. Statistical machine-learning–based predictive control of uncertain nonlinear processes. AIChE J. 2022, 68, e17642. [Google Scholar] [CrossRef]
Wächter, A.; Biegler, L.T. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 2006, 106, 25–57. [Google Scholar] [CrossRef]

Figure 1. A schematic of a recurrent neutral network and its unfolded structure.

Figure 2. Closed–loop state trajectories

(C_{A}, T)

for the uncertain CSTR system operating in mode 1 for

t \in [0, 0.25

h) (red solid line) and switching to mode 2 at

t = 0.25

h (blue solid line) under the RNN-LEMPC of Equation (26), (a) using the initial offline-learning RNNs at all times, and (b) using the online updating RNNs, for the initial condition

(C_{A}, T)

= (1.95 kmol/m

^{3}

402 K) (marked as red diamond).

Figure 2. Closed–loop state trajectories

(C_{A}, T)

for the uncertain CSTR system operating in mode 1 for

t \in [0, 0.25

h) (red solid line) and switching to mode 2 at

t = 0.25

h (blue solid line) under the RNN-LEMPC of Equation (26), (a) using the initial offline-learning RNNs at all times, and (b) using the online updating RNNs, for the initial condition

(C_{A}, T)

= (1.95 kmol/m

^{3}

402 K) (marked as red diamond).

Figure 3. Comparisons of

{\hat{V}}_{1} (x)

for mode 1 and

{\hat{V}}_{2} (x)

for mode 2 under the initial offline-learning and online updating RNN models.

Figure 3. Comparisons of

{\hat{V}}_{1} (x)

for mode 1 and

{\hat{V}}_{2} (x)

for mode 2 under the initial offline-learning and online updating RNN models.

Figure 4. (a) Closed-loop state (

C_{A}

and T) and (b) manipulated input (

C_{A_{0}}

and Q) profiles for the uncertain CSTR system operating in mode 1 for

t \in [0, 0.25

h) and switching to mode 2 at

t = 0.25

h under the RNN-LEMPC of Equation (26) using the initial offline-learning RNNs at all times (blue solid line), and using the online updating RNNs (red dashed line), for the initial condition

(C_{A}, T)

= (1.95 kmol/m

^{3}

402 K).

Figure 4. (a) Closed-loop state (

C_{A}

and T) and (b) manipulated input (

C_{A_{0}}

and Q) profiles for the uncertain CSTR system operating in mode 1 for

t \in [0, 0.25

h) and switching to mode 2 at

t = 0.25

h under the RNN-LEMPC of Equation (26) using the initial offline-learning RNNs at all times (blue solid line), and using the online updating RNNs (red dashed line), for the initial condition

(C_{A}, T)

= (1.95 kmol/m

^{3}

402 K).

Figure 5. Closed-loop state trajectories

(C_{A}, T)

for the nominal CSTR system operating in mode 1 for

t \in [0, 0.1 h)

using the initial offline-learning RNN (red solid line), and switching to mode 2 at

t = 0.1 h

using the initial RNN (blue solid line) and online updating RNNs (pink dashed line) under the RNN-LEMPC of Equation (26) with the initial condition

(C_{A}, T) = (1.95 {kmol / m}^{3} 402 K)

(marked as red diamond).

Figure 5. Closed-loop state trajectories

(C_{A}, T)

for the nominal CSTR system operating in mode 1 for

t \in [0, 0.1 h)

using the initial offline-learning RNN (red solid line), and switching to mode 2 at

t = 0.1 h

using the initial RNN (blue solid line) and online updating RNNs (pink dashed line) under the RNN-LEMPC of Equation (26) with the initial condition

(C_{A}, T) = (1.95 {kmol / m}^{3} 402 K)

(marked as red diamond).

Figure 6. Comparisons of

{\hat{V}}_{1} (x)

for mode 1 using the initial offline-learning RNN (red dashed line), and

{\hat{V}}_{2} (x)

for mode 2 using the initial RNN (blue solid line) and online updating RNNs (pink dashed line).

Figure 6. Comparisons of

{\hat{V}}_{1} (x)

for mode 1 using the initial offline-learning RNN (red dashed line), and

{\hat{V}}_{2} (x)

for mode 2 using the initial RNN (blue solid line) and online updating RNNs (pink dashed line).

Table 1. Parameter values of the CSTR.

$E = 5 \times 10^{4}$ kJ/kmol	$F = 5 m^{3}$ /h
$R = 8.314 kJ / kmol K$	$T_{0_{1}} = 300 K, T_{0_{2}} = 290$ K
$V = 1$ m $^{3}$	$Q_{s} = 0.0$ kJ/h
$Δ H = - 1.15 \times 10^{4}$ kJ/kmol	$C_{A 0 s_{1}}$ = 4 kmol/m $^{3}$ , $C_{A 0 s_{2}}$ = 4.55 kmol/m $^{3}$
$ρ_{L} = 1000$ kg/m $^{3}$	$T_{s_{1}} = 402 K, T_{s_{2}} = 475$ K
$C_{p} = 0.231$ kJ/kg K	$C_{A s_{1}}$ = 1.95 kmol/m $^{3}$ , $C_{A s_{2}}$ = 0.83 kmol/m $^{3}$
$k_{0} = 8.46 \times 10^{6} m^{3}$ /kmol h

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Economic Model Predictive Control of Nonlinear Systems Using Online Learning of Neural Networks

Abstract

1. Introduction

2. Preliminaries

2.1. Notation

2.2. Class of Switched Non-Linear Systems

2.3. Recurrent Neural Networks (RNN)

3. Online Learning of RNNs

3.1. Generalized Error of RNNs Updated Online with i.i.d. Training Data

3.2. Generalized Error of RNNs Updated Online with Non-i.i.d. Training Set

4. RNN-Based LEMPC of Switched Non-Linear Systems

4.1. Lyapunov-Based Control Using RNN Models

4.2. Lyapunov-Based EMPC Using RNN Models

5. Application to a Chemical Process Example

5.1. The CSTR Switched between Two Modes with Bounded Disturbances

5.2. The CSTR Switched between Two Steady-States without Disturbances

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics