You are currently on the new version of our website. Access the old version .
Applied SciencesApplied Sciences
  • Article
  • Open Access

28 November 2025

A Transformer Tube-Based Model Predictive Control Method Under Model Mismatches

,
,
and
1
The School of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
2
College of Aerospace and Civil Engineering, Harbin Engineering University, Harbin 150001, China
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Applications of Artificial Intelligence in Industrial Engineering

Abstract

In industrial processes, mismatches between models and actual systems often degrade the performance of Model Predictive Control (MPC), potentially leading to instability or safety violations under dynamic operating conditions. To address this challenge, the paper introduces a hybrid control architecture named Trans-Tube-MPC, which leverages Transformer-based temporal modeling and tube-based robust constraints to enhance the robustness of the control system against model failures. The approach employs a Transformer network trained on closed-loop operational data to predict and compensate for state deviations caused by disturbances, while adaptive tube constraints dynamically adjust prediction boundaries to mitigate the risk of overcorrection. The innovation of this method lies in the introduction of a dynamically adjusted tube width, which adapts based on the prediction discrepancy between the Transformer model and the state-space model, thus allowing the control system to remain robust even in the face of model failures. Experimental studies demonstrate that the Trans-Tube-MPC framework can maintain control performance under significant model parameter deviations where conventional MPC would fail. The proposed method provides an effective solution to the problem of model mismatch and prediction error and shows significant advantages in dealing with control issues under model failure conditions, establishing a new way to reconcile data-driven adaptability with the reliability of control systems.

1. Introduction

Model Predictive Control (MPC) has become the standard control method in the industry over the past few decades due to its outstanding performance in handling multivariable systems with significant interactions [1]. The strength of MPC lies in its ability to account for constraints on inputs, states, and outputs within its framework, leading to more optimal control outcomes [2]. However, the effectiveness of an MPC system is highly dependent on the accuracy of the plant model. A primary challenge faced by MPC is the plant-model mismatch (PMM), which can stem from structural issues, such as unmodeled dynamics, or parametric issues, such as inaccuracies in model parameters [3].
This mismatch is natural and unavoidable throughout the lifecycle of physical systems, prompting researchers to explore methods for monitoring, diagnosing, and rectifying MPC system performance. The implementation of MPC begins with constructing a dynamic model to predict future process outputs. Based on these predictions, control actions are calculated to minimize a predefined cost function [4,5]. However, when there is a mismatch between the plant model and the actual plant, even MPC algorithms struggle to maintain effective control.
Additionally, unmeasured disturbances can further widen the gap between the model and the plant, potentially leading to a decline in MPC performance [6,7]. Although the feedback mechanism of MPC can tolerate model mismatches to some extent, the conventional MPC formulation does not guarantee unbiased steady-state control in the presence of such mismatches. In this work, model–plant mismatches are quantitatively evaluated over a range of −60% to +60% variations in process parameters such as gain, time constant, and time delay, reflecting realistic uncertainties in industrial settings.
In the field of industrial Model Predictive Control, various strategies have been developed by researchers to address offset issues caused by plant-model mismatches or unmeasured disturbances [8,9,10]. Among these, the DMC unbiased control correction scheme is a method that compensates for mismatches by adjusting the setpoint, assuming that disturbances are step-like and remain constant within the prediction horizon [11]. While this approach has had some success in eliminating steady-state offsets, its ability to handle input disturbances is limited. Some studies have proposed adaptive disturbance estimation methods, such as those based on time-varying forgetting adaptive techniques, where disturbances are modeled as an integrated moving average process [12]. These methods have shown some applicability in dealing with nonlinear plant-model mismatches or disturbances. However, these approaches require a trade-off between computational resources and model accuracy to ensure effectiveness and practicality in real industrial applications.
Another classic method to avoid steady-state offset is to introduce an integrator into the control loop, like PI control, which can be implemented by controlling the increment of the control variable and expanding the system’s state variables [13]. However, this approach increases the system’s dimensionality and computational complexity, requiring additional calculations to ensure the closed-loop stability of the augmented system. Moreover, the introduction of state observers, such as the Extended Kalman Filter, is also a common method for eliminating steady-state offset, but its accuracy and dynamic response speed are crucial to control performance [14,15].
The application of neural network models has introduced new solutions in the control field [16,17]. Researchers have particularly focused on the impact of feedforward neural networks (FNN) and recurrent neural networks (RNN) on control performance [18,19]. Studies have shown that when the prediction horizon exceeds the control range, NMPC algorithms using FNN models often encounter steady-state offset issues. This indicates that although FNN can provide effective predictions in certain scenarios, more complex network structures may be required to improve performance in broader control contexts.
Lu and Tsai proposed an innovative approach by designing a Generalized Predictive Control (GPC) strategy based on a recurrent fuzzy neural network [20,21]. This method derives the control law by minimizing a modified predictive performance criterion, and both simulation and experimental results have demonstrated that the strategy can achieve satisfactory control performance under setpoint changes and load disturbances. The advantage of this approach lies in its ability to handle the uncertainties of fuzzy logic, offering a new perspective for NMPC. Zhang and colleagues further developed an unbiased output feedback NMPC method that combines a fuzzy model with an integral disturbance model [22]. They utilized an enhanced piecewise observer to estimate system states and combined disturbances, showing the effectiveness of this method in handling complex systems. Through this approach, unbiased control can be achieved even in the presence of unmeasured disturbances. However, a common limitation of these studies is their reliance on specific predictive model structures. This dependency restricts the generalization ability of these methods across different types of models. For instance, FNN-based MPC often exhibits steady-state offsets when prediction horizons exceed certain limits [18], while RNNs (e.g., LSTM) improve temporal modeling but may suffer from training instability under significant model mismatches [19].
The paper proposes a hybrid control architecture named Trans-Tube-MPC, which utilizes Transformer-based temporal modeling and tube-based robust constraints to enhance the robustness of the control system against model failures. Our method employs a Transformer network trained on closed-loop operational data to predict and compensate for state deviations caused by disturbances, while adaptive tube constraints dynamically adjust prediction boundaries to mitigate the risk of overcorrection. The core idea is to construct a neural network multi-step prediction model using closed-loop simulation datasets and integrate it with the MPC strategy to accurately estimate plant-model mismatches. In this approach, state estimation is performed using a Kalman filter [23,24,25], while the adaptability of the model is enhanced by leveraging the differences between predictions made by the Transformer and those made by the state-space model [26,27,28,29]. The proposed Trans-Tube-MPC is compared against three benchmark controllers: conventional MPC, Tube-MPC with fixed constraints, and Trans-MPC (which uses Transformer predictions without tube adjustments). The key innovations include:
  • Dynamic tube width adjustment: The method adaptively adjusts the tube width to cope with changes in model prediction errors, allowing the control system to remain robust even in the face of model failures. The adjustment of the tube width is based on the prediction discrepancy between the Transformer model and the state-space model, enabling the control system to effectively self-adjust when facing model mismatches.
  • Integration of multi-step prediction model with MPC strategy: The paper constructs a neural network multi-step prediction model using closed-loop simulation datasets and integrates it with the MPC strategy. This way can accurately estimate and compensate for state deviations caused by disturbances, thereby improving the estimation accuracy of plant-model mismatches and control performance.
  • Construction of a new loss function: A comprehensive control objective function is defined, which considers not only tracking errors and input variations but also tube constraint penalties to ensure that control inputs remain within acceptable limits. The flexibility of weight matrices and penalty coefficients allows designers to adjust the relative importance of control objectives according to specific application scenarios, thereby optimizing the performance of the control system. The weighting matrices Q and R and the tube penalty coefficient λ were tuned via iterative simulation under nominal conditions. The final values used in experiments are Q = diag (0, 1), R = diag (2.5, 1), and λ = 0.1, selected to balance tracking performance and control effort.
The Trans-Tube-MPC framework proposed in the paper can maintain control performance under significant model parameter deviations, where conventional MPC would fail. Comparative analysis shows marked improvements in disturbance rejection and transient response characteristics, especially under scenarios of coupled multi-variable interactions. The first-order plus time-delay model is widely used in process control due to its simplicity and ability to capture dominant dynamics in many industrial processes, such as thermal and chemical systems. This approximation remains valid for processes with monotonic step responses and where higher-order dynamics are negligible. The advantages of this strategy include the ability to eliminate steady-state offset after the process reaches a steady state without increasing computational complexity and requiring only process outputs as measurements. Moreover, the proposed MPC strategy is easy to implement, requiring only minor adjustments to the traditional MPC formulation, and exhibits strong generalizability, making it adaptable to various predictive models. Experimental results demonstrate that the new strategy achieves superior unbiased control performance, even under significant model parameter deviations (up to 60% mismatch in extreme conditions) where conventional MPC would completely fail.

2. Preliminary

2.1. Transfer Function

A typical industrial system comprises multiple interconnected subsystems, and its model can be represented as follows:
y ( s )   =   G ( s )   u ( s )
where  y ( s )  represents the system output,  G ( s )  denotes the transfer function of the system, and  u ( s )  represents the system input.
G ( s ) = g 11 ( s ) g 12 ( s ) g 1 n ( s ) g 21 ( s ) g 22 ( s ) g 2 n ( s ) g m 1 ( s ) g m 2 ( s ) g m n ( s )
For numerous industrial subsystems, a suitable approximation to assess the transfer function is the linear first-order time-delay model, presented as follows:
g ( s ) = K p 1 + T p s e T d s
where  K p  represents the gain coefficient,  T p  is the time constant, and  T d  denotes the time delay. Typically, an optimization algorithm is employed for system identification to ascertain the values of these three parameters.
Let us consider a scenario where the input to the operating process is denoted as u, the measurable process output is represented by y, and the steady-state values are given by  u 0  and  y 0 , respectively. Without loss of generality, the response of a linear first-order system that incorporates time delays can be expressed as a differential equation:
T p d y d t + ( y y 0 ) = K p ( u ( t T d ) u 0 )
Typically, it can be solved using techniques like least squares to determine the optimal values for parameters  K p T p  and  T d , minimizing the sum of differences between predicted and actual values of y. The transfer function model of the system is derived from these parameters, and the state space equation is obtained from the transfer function.
Once these parameters are determined, the transfer function of the entire system is formulated. Based on this transfer function, the state-space equation of the system can be established:
x ( k + 1 ) = A x ( k ) + B u ( k ) y ( k ) = C x ( k )
where k represents the discrete time series, x denotes the system state variables, y signifies the predicted system outputs, and u stands for the control input variables. Additionally, A, B, and C are the coefficient matrices with corresponding dimensions.

2.2. Problem Description

State Space Models (SSM) are commonly used to describe the dynamic behavior of systems. However, traditional state space models may have limitations in prediction accuracy when dealing with highly nonlinear and complex system dynamics. To address this issue, we propose a method that combines state space models with neural network prediction models to improve prediction accuracy. In this method, neural networks such as LSTM or Transformer are used to capture the nonlinear dynamics and long-term dependencies of the system, while state space models describe the basic dynamic characteristics of the system. The high-accuracy predictions provided by the neural network models can complement the deficiencies of traditional state space models, thereby enhancing overall prediction accuracy.
The controller design is based on the Model Predictive Control (MPC) framework, which calculates the optimal control variables by optimizing a loss function. This composite loss function consists of three parts: the prediction error of the state space model, the prediction error of the neural network model, and the variation in the control inputs. By minimizing the loss function, we can obtain the optimal control inputs that enable stable control of the controlled variables and maintain the system’s expected performance in complex and dynamic environments. The approach, combining traditional state space models with neural network prediction models, provides an effective solution for the precise control of complex systems, significantly enhancing system stability and control performance.

3. Methodology

To enhance control stability and performance, the paper proposes a methodology to combine traditional state space models with neural network prediction models. The overall structure of methodology is illustrated in Figure 1, encompassing the prediction model, multi-step transformer model, optimization control, and other components. This diagram illustrates the working principle of a Model Predictive Control (MPC) system. It shows the components of the MPC system and their interactions. Here is a detailed introduction to the diagram:
Figure 1. Model predictive control with neural network.
Schematic of the proposed Trans-Tube-MPC structure. Notation: Np: prediction horizon; Nc: control horizon; ys: setpoint; ym: measured output; u: manipulated input; y: controlled output (e.g., temperature in °C, CO concentration in %). The input on the left side of the diagram is the future reference trajectory  y p ( k ) y p ( k + 1 ) y p ( k + N p 1 ) . These are the desired output values of the system for the next Np time steps.
Predictive model combines state space model and multi-step transformer to predict future outputs  y p ( k ) y p ( k + 1 ) y p ( k + N p 1 )  based on current and future inputs  u ( k ) u ( k + 1 ) u ( k + N c 1 ) .
Cost function includes the objective function used for optimization, typically aimed at minimizing errors and control energy. Contains the physical and operational constraints that the system must satisfy. Solves the optimization problem to find the optimal control input sequence  u ( k ) u ( k + 1 ) u ( k + N c 1 ) .
The controlled output y(k) is the actual output of the system, fed back to the predictive controller. The manipulated input u(k) is the control signal calculated by the controller and applied to the actual system (Plant). The plant is the actual physical system that receives the manipulated input u(k) and generates the controlled output y(k).
The working principle of the entire process is as follows: The predictive controller receives the reference trajectory and uses the predictive model to calculate future outputs. Based on the predicted outputs and reference values, the solver optimizes the cost function and considers constraints to calculate the optimal control input sequence. The current control input u(k) is applied to the plant. The plant generates the controlled output y(k) and feeds it back to the predictive controller. The predictive controller repeats these steps at each sampling period to achieve the goal of tracking the reference trajectory in the future.

3.1. State Space Model

MPC is a control algorithm rooted in predictive modeling. This modeling approach relies on historical data related to the control object to forecast process outputs at a future time. The predictive model has a broad range of applications and is not constrained by specific structural forms; rather, it focuses on functional roles that enable accurate predictions of future control object outputs. These predictive models can be derived from state equations, transfer functions, and other traditional mathematical control models. For linear stable systems, predictive models can be established through finite step response or finite impulse response methodologies.
As the loss function aims to optimize changes in the input vectors, we initially perform some transformations to substitute the input vectors in the state equations with the alterations in the input. Let:
Δ u ( k ) = u ( k ) u ( k 1 ) Δ x m ( k + 1 ) = A m Δ x m ( k ) + B m Δ u ( k 1 ) x ( k ) = x m ( k ) T y ( k ) T
Then:
Δ y ( k + 1 ) = C m A m Δ x ( k ) + C m B m Δ u ( k ) y ( k + 1 ) = y ( k ) + C m A m Δ x ( k ) + C m B m Δ u ( k )
Written in compact form
Δ x m ( k + 1 ) y ( k + 1 ) = A m o m T C m A m 1 Δ x m ( k ) y ( k ) + B m C m B m Δ u ( k ) y ( k ) = o m 1 Δ x m ( k ) y ( k )
Obviously, the new state space equation is:
x ( k + 1 ) = A x ( k ) + B u ( k ) y ( k ) = C x ( k )
where
A = A m o m T C m A m 1 ,   B = B m C m B m ,   C = o m 1 ,   x ( k ) = Δ x m ( k ) y ( k )

3.2. Multi-Step Neural Network

The Transformer model consists of an encoder and a decoder, each of which includes multiple layers of self-attention mechanisms and feedforward neural networks. The encoder is used to encode the input sequence into context vectors, while the decoder uses these context vectors to generate the target sequence.
Assuming the input sequence is  X = ( x 1 , x 2 , , x T ) , the goal is to predict  Y = ( y T + 1 , y T + 2 , , y T + k ) , where T is the length of the known historical data, and K is the prediction horizon. For each prediction step k, the decoder output  Y ^ T + K  is based on the prediction from the previous step  Y ^ T + K 1 . In the Transformer, the decoder’s input is the prediction result or the true target value from the previous time step. This core part allows the model to process all positions in the input sequence simultaneously, with each “head” focusing on different parts of the input. The Transformer model used in this work consists of 6 encoder and decoder layers, 8 attention heads, a hidden dimension of 512, and a feedforward dimension of 2048. It was trained using the Adam optimizer with L2 regularization, a batch size of 32, and a learning rate of 1 × 10−4.
Attention ( Q , K , V ) = softmax Q K T d k V MultiHead ( Q , K , V ) = Concat head 1 , , head h W O head i = Attention Q W i Q , K W i K , V W i V
where  Q ,   K ,   V  are the queries, keys, and values, which are linear transformations of the input,  Q W i Q , K W i K , V W i V  are learnable parameters.
Since the Transformer lacks the inherent ability to process sequential data, positional encoding is used to incorporate temporal information into the sequence. A common method is using a combination of sine and cosine functions:
P E p o s , 2 i = sin p o s 10000 2 i / d model P E p o s , 2 i + 1 = cos p o s 10000 2 i / d model
where pos is the position in the sequence, i is the dimension index,  d model  is the model dimension. In multi-step prediction, the output at each time step is determined by the output from the previous time step and the context information.
The direct method directly predicts multiple future time steps as shown in Figure 2. In this method, each time step prediction is carried out independently. For the input sequence  y = { y 1 , y 2 , , y T } , the Transformer encoder transforms the input sequence into hidden representations  z , and then predicts each future time step through different decoder heads or different decoding steps.
Figure 2. Transformer Direct Method.
The input sequence  y = { y 1 , y 2 , , y T }  is passed through the Transformer encoder to generate the hidden representations  z
Z = TransformerEncoder ( y )
For each future time step  t   +   j  where  j = 1 , 2 , , H  a decoder head or different linear layers  w out ( i )  of the same decoder are used to generate the predicted values  x ^ T + j
y ^ T + 1 = W out ( 1 ) TransformerDecoder ( Z ) y ^ T + 2 = W out ( 2 ) TransformerDecoder ( Z ) y ^ T + H = W out ( H ) TransformerDecoder ( Z )

3.3. Trans-Tube-MPC

The Trans-Tube-MPC combines the linear state-space model with the multi-step predictive model to optimize control inputs, ensuring the system output closely follows the reference value. The “tube” refers to a guaranteed and adjusted bounded area around the nominal trajectory to ensure robustness against disturbances and model errors. To enhance robustness against model mismatch, we introduce a dynamic tube constraint governed by the prediction discrepancy between the Transformer and state-space model. Assuming  Nc  for the control step and  Np  for the prediction step. The linear state-space model and the multi-step predictive model are shown as below:
x ( k + 1 ) = A x ( k ) + B Δ u ( k ) y ( k ) = C x ( k ) + D Δ u ( k ) y ^ ( k ) = f ( u k i N c + 1 : k , y k i N c + 1 : k )
Predicting the future x state and y state of the system within the prediction interval
x ( k i + N p | k i ) = A N p x ( k i ) + A N p 1 B Δ u ( k i ) + A N p 2 B Δ u ( k i + 1 ) + + A N p N c B Δ u ( k i + N c 1 )
y ( k i + N p | k i ) = C A N p x ( k i ) + C A N p 1 B Δ u ( k i ) + C A N p 2 B Δ u ( k i + 1 ) + + C A N p N c B Δ u ( k i + N c 1 )
where  N p = 1 , 2 , , N p , and Let
Δ U = Δ u ( k i ) T Δ u ( k i + 1 ) T Δ u ( k i + N c 1 ) T , Y = y ( k i + 1 | k i ) T y ( k i + 2 | k i ) T y ( k i + N p | k i ) T , Y ^ = y ^ ( k i + 1 | k i ) T y ^ ( k i + 2 | k i ) T y ^ ( k i + N p | k i ) T
Then
Y = F x ( k i ) + Φ Δ U
where
F = C A C A 2 C A 3 C A N p     Φ = C B 0 0 C A B C B 0 C A 2 B C A B 0 C A N p 1 B C A N p 2 B C A N p N c B
where x and u represent the system’s state and control input, respectively.  y ( k )  is the prediction from the state space model, and  y ^ ( k )  is the prediction based on the multi-step model.
To enhance robustness against model mismatch, we introduce a dynamic tube constraint governed by the prediction discrepancy between the Transformer and state-space model. Define the Tube width at step k as:
Δ ( k ) = α Δ ( k 1 ) + ( 1 α ) y ^ ( k ) y ( k ) 2
where α determines the constraint adjustment speed. This width modulates the admissible control input range:  u min + Δ ( k ) u ( k ) u max Δ ( k ) .
The control objective is to minimize tracking errors, input variations, and tube constraint penalties:
J = i = 1 N p y ref y ( i ) Q 2 + Δ u ( i ) R 2 + λ j = 1 N c Δ ( j ) 2
where  q 0 R 0 λ 0 , and  Q ,   R  are weight matrices,  λ  is the tube penalty coefficient. The smoothing factor α was set to 0.8 to balance historical and current residuals. The tube penalty coefficient  λ  was tuned in the range of 0.05 to 0.2, with 0.1 selected for the final experiments.
Based on the control sequence definition. Equation (18) is reformulated into the standard QP form:
J = 1 2 Δ U T H Δ U + f T Δ U + C
where
Δ U : The change in control input. H is The Hessian matrix, representing the coefficient matrix of the quadratic terms.  f  is The gradient term, representing the coefficient vector of the linear terms. C is the constant term, independent of the control input.
The Hessian matrix H is defined as:
H = 2 B T Q B + R
where  Q ¯ = diag ( Q , , Q )  and  R ¯ = diag ( R , , R ) . The gradient term  f  is defined as:
f = 2 B T Q ¯ ( Y ref A x k )
The constant term  C  is defined as:
C = λ Δ 2 ( k + i )
Since C is independent of ΔU, it does not affect the optimal solution. The optimal control sequence is derived as:
Δ U * = H 1 f
The first control input is applied:
u k = u k 1 + Δ U * [ 0 ]
The complete algorithm flow is shown in Algorithm 1.
Algorithm 1. Trans-Tube-MPC
Input: State-space model (A, B, C), Transformer T_θ, Prediction horizon  Np , Control horizon  Nc  
Output: Optimal control sequence U*

1: Initialize:
      - Estimate initial state x^_0 via Kalman filter
      - Set initial Tube width Δ = 0, smoothing factor α = 0.8, penalty coefficient λ = 0.1

2: for control cycle k = 0,1,…,N_p do
3:        # Feedforward prediction
4:        y_ssm = A x^_k + B u_k
5:        y_trans = T_θ(encode(u_{k − T:k}, y_{k − T:k}))
6:        
7:        # Tube constraint update
8:        residual = np.linalg.norm(y_trans − y_ssm, ord = 2)
9:        Δ = α * Δ_prev + (1 − α) * residual
10:      u_min_adj = u_min_nominal + Δ
11:      u_max_adj = u_max_nominal − Δ
12:      
13:      # Construct QP problem
14:      H = 2 * (B.T @  q ¯  @ B +  R ¯ )
15:      f = 2 * B.T @  q ¯  @ (Y_ref − A @ x^_k)
16:      C = λ * np.sum(Δ**2)
18:      # Solve QP
19:        Δ U *  = -np.linalg.inv(H) @ f
20:      u_k = u_prev +  Δ U *  [0]
22:      # State update
23:      x^_{k + 1} = kalman_update(A @ x^_k + B @ u_k, y_{k + 1})
24: end for

4. Experiment

The paper focuses on the temperature control system of a C5 preheater in cement production, verifying the performance of four control algorithms (conventional MPC, Tube-MPC, Trans-MPC, and Tube-Trans-MPC) under model mismatch scenarios. The system transfer function model is defined as shown in Table 1, where the input variable MV is the coal feed to the calciner, and the output variables CV = [CV1, CV2] represent the C5 temperature and CO concentration, respectively. Tp (s), Td(s), Kp, CV1 (°C), CV2 (% CO). The dataset was split into 2/3 for training and 1/3 for validation.
Table 1. Transfer function.
Set the controller’s prediction horizon to 50 and the control horizon to 10. The matrices are defined as Q = [0 0; 0 1], R = [2.5 0; 0 1], S = [0 0; 0 1]. The remaining parameters are set as shown in Table 2:
Table 2. Parameters setting.
The paper employs four algorithms, with their configurations detailed in Table 3.
Table 3. Algorithm Configurations.
  • MPC: Baseline with two-term quadratic cost: tracking error + input variation
  • Tube-MPC: Fixed tube boundary handles bounded uncertainties
  • Trans-MPC: Embeds 6-layer Transformer (hidden = 512, feedforward = 2048, 8 attention heads)
  • Tube-Trans-MPC: Dual-mode compensation, Transformer-based multi-step disturbance prediction and Tube width adaptation via prediction residuals
In the actual production system of cement, the C5 temperature model is influenced by production and equipment, and due to frequent changes in operating conditions, a variety of process models are generated. Definition of model mismatch levels based on the actual working conditions of cement production, 13 levels of model mismatch (−60% to +60%) are set, with parameter variation rules as shown in Table 4.
Table 4. Model-plant mismatch ratio.

Calculation Rules

Kp follows linear scaling, Tp scales inversely, and Td scales proportionally, with specific rules defined by the following equations.
K actual = K nominal × 1 + mismatch   ratio 100
T actual = T nominal 1 + mismatch   ratio 100
T d actual = T d nominal × 1 + mismatch   ratio 100
In the paper, the initial Transformer model was trained using offline data comprising a dataset of approximately 10,000 instances. The data dimensions include MV1, MV2, CV1, and CV2, where MV1, MV2, and CV1 are used as input data, and CV2 is used as output data. Two-thirds of the dataset was used for model training, while the remaining one-third was used for model validation. The parameters were optimized using the Adam optimizer and L2 regularization. In each failure model scenario, comparative tests were conducted using several MPC. The test results are shown in Table 5. All results are based on 5 independent runs. Standard deviations for RMSE were within ±0.3 across all mismatch scenarios.
Table 5. Control results under different case.
From the Table 5, the variation trends of Peak Value, Overshoot, and RMSE for the four methods under different Ratio conditions can be observed. The results indicate that for scenarios with a certain degree of mismatch. To quantitatively assess the impact of different degrees of model mismatch on control effectiveness, the following metrics are used to evaluate control performance [30]. Overshoot is used to evaluate the overshoot of the controlled variable; the smaller the overshoot, the better the system’s disturbance rejection.
O v e r s h o o t = C V max C V target C V target × 100 %
R M S E = ( C V C V target ) 2 N
Under negative Ratio conditions, Trans-Tube-MPC generally exhibits lower Peak Values. For example, at Ratio = −60%, Trans-Tube-MPC achieves 872.629 compared to Tube-MPC’s 878.903, suggesting its enhanced effectiveness in handling large negative disturbances to stabilize the system faster. Concurrently, its Overshoot remains lower (0.30% vs. 1.02% for Tube-MPC). However, Trans-Tube-MPC shows slightly higher RMSE (6.040 vs. 4.987), indicating a potential trade-off between peak suppression and overall error accumulation. Under positive Ratio conditions (e.g., 60%), Trans-Tube-MPC demonstrates superior performance with a Peak Value of 879.246 versus Tube-MPC’s 893.329, along with significantly reduced Overshoot (1.06% vs. 2.68%) and lower RMSE (6.346 vs. 12.463). This confirms Trans-Tube-MPC’s comprehensive advantages in large positive disturbance scenarios.
At Ratio = −50%, Trans-MPC achieves no overshoot versus MPC’s 0.08%, demonstrating its capability to eliminate overshoot in specific cases. However, this comes with increased RMSE (8.867 vs. 6.227), requiring application-specific tolerance evaluation. As the absolute Ratio increases (both positive and negative), Tube-MPC and MPC exhibit significant RMSE growth, while Trans-Tube-MPC and Trans-MPC maintain gentler RMSE increments. Particularly under positive Ratios, Trans-Tube-MPC’s RMSE increase is markedly smaller than Tube-MPC’s, evidencing enhanced robustness through Transformer integration.
At Ratio = 0%, all methods exhibit minimal overshoot with their RMSE approaching the lowest value near zero. Notably, under this disturbance-free condition, Trans-Tube-MPC shows a slightly higher RMSE (4.389) compared to Tube-MPC (3.838), which may indicate the additional computational overhead introduced by its feedback mechanism. Nevertheless, in practical applications where disturbance handling capability holds greater significance, Trans-Tube-MPC demonstrates superior performance.
For a clearer comparison, we chose several cases of model failure as shown in Figure 3, Figure 4 and Figure 5. As shown in Figure 2, under the −60% model-plant mismatch conditions, the four control strategies exhibit marked divergence in performance characteristics. From Figure 3a, although CV1 fluctuates and the system is relatively stable, it still exhibits some fluctuations, indicating that the 60% model-plant mismatch has an impact on the system performance. From Figure 3b, With the Tube-MPC setting, although CV1 is close to the target value of 870, the system fails to fully meet the target due to the difference between the model and the plant. the poor control of CV2 and the large adjustment of MV indicate that this setting is less adaptable in the face of model-plant mismatch. From Figure 3c, Trans-MPC shows better control response and is close to the target value of 870 despite the slightly higher CV1. The larger fluctuation of MV indicates that the system’s adjustment action is more drastic in the case of 60% model-plant mismatch and there is a certain gap in control accuracy. From Figure 3d Trans-Tube-MPC shows a smoother control effect compared to the other settings, with CV1 close to the target value of 870 and CV2 with better stability. The adjustment of MV fluctuates, but the overall performance of the system is more stable in the case of 60% model-plant mismatch.
Figure 3. Different MPC model-plant mismatch (−60%). (a) MPC model-plant mismatch. (b) Tube-MPC model-plant mismatch. (c) Trans-MPC model-plant mismatch. (d) Tube_trans_MPC model-plant mismatch.
Figure 4. Different MPC model-plant mismatch (−30%). (a) MPC model-plant mismatch. (b) Tube-MPC model-plant mismatch. (c) Trans-MPC model-plant mismatch. (d) Tube_trans_MPC model-plant mismatch.
Figure 5. Different MPC model-plant mismatch (+30%). (a) MPC model-plant mismatch. (b) Tube-MPC model-plant mismatch. (c) Trans-MPC model-plant mismatch. (d) Tube_trans_MPC model-plant mismatch.
The Tube-MPC and Trans-Tube-MPC settings converge to a target value of 870 for CV1 in the face of 60% model-plant mismatch, but fluctuate, and the control accuracy of the system decreases. MPC settings have larger overshoots and higher RMSEs despite CV1 being close to the target value, and the control accuracy of the system is poorer. Trans-MPC and Trans-Tube-MPC have a smoother control response, with CV1 close to the target value of 870, relatively small control error, and smoother adjustment of MV.
As shown in Figure 3, under the −30% model-plant mismatch condition, four control strategies exhibited significant performance differences. From Figure 4a, CV1 is close to the target value with good stability, but CV2 fluctuates more, showing some mismatch effects and periodic fluctuations in MV adjustment. From Figure 4b, CV1 is close to the target value, but not quite there, CV2 is poorly controlled, and MV adjustment is large, indicating poor adaptability. From Figure 4c, CV1 is close to the target value but slightly higher, CV2 fluctuates more, MV fluctuates more, and the control accuracy has decreased. From Figure 4d, CV1 is close to the target value, CV2 is stable, MV is adjusted smoothly, showing the best control effect
With the Tube-MPC setting, although CV1 is close to the target value of 870, the system fails to fully meet the target due to the difference between the model and the plant, with poorer control of CV2 and larger MV adjustments, showing poorer adaptability. The Trans-MPC setting is able to keep CV1 close to the target value of 870 better, but due to the larger MV fluctuation, the control accuracy of the system decreases. The Trans-Tube-MPC performs the smoothest, with CV1 close to the target value of 870, better stability of CV2, and smoother adjustment of MV, suggesting that this configuration has the best control effect at 30% model-plant mismatch. Overall, Trans-Tube-MPC performs best in this model-plant mismatch scenario, maintaining the target value and controlling the stability of the other variables better.
As shown in Figure 4, under a 30% model-plant mismatch, Trans-Tube-MPC performs the best, effectively keeping both CV1 and CV2 close to their target values with minimal fluctuations, which reflects its superior control stability. This configuration manages to handle the mismatch without significant deviation, ensuring precise tracking of the setpoints. Trans-MPC, while still performing well, shows noticeable improvements in controlling CV2 compared to other methods, exhibiting fewer and smaller fluctuations, but it does not quite match Trans-Tube-MPC in overall control accuracy and stability.
On the other hand, Tube-MPC and MPC show similar levels of performance, where CV1 remains relatively close to the target, but CV2 experiences larger fluctuations, indicating that the control system is less stable. The MV adjustments are also more frequent and larger in scale, suggesting that these methods are more reactive and less precise in handling the mismatch. Trans-Tube-MPC outperforms the other configurations by providing stronger control, maintaining both CV1 and CV2 with minimal deviation from their setpoints, and ensuring the most stable and precise response under a 30% mismatch scenario. This makes it the most robust choice for handling such mismatches in practical applications.
In conclusion, the experiment compares the performance of different control methods under severe model-plant mismatches (±60%). The results show that the performance of traditional predictive control methods significantly deteriorates when model errors exceed 30%. When the mismatch reaches 60%, the control error of traditional methods reaches 11.9, while the newly developed Tube-Trans-MPC method can stabilize the error between 6.3 and 7.5, demonstrating its stronger ability to handle abnormal situations.
Specifically, Tube-Trans-MPC performs exceptionally well in three areas: Under a 60% severe mismatch, it improves control accuracy by 1.6% to 3.7% compared to regular Tube-MPC (e.g., peak error drops from 893.3 to 879.2), reduces overshoot by 61%, and improves overall error metrics (RMSE) by nearly half. When encountering negative disturbances (e.g., −50% mismatch), it eliminates overshoot, whereas traditional methods still exhibit a 0.08% overshoot. Under common moderate mismatches (±20%), its error is reduced by 21% to 37% compared to using the Transformer method alone.
Hybrid architecture exhibits advantages in typical ±20% mismatch scenarios, reducing control errors by 21–37% compared to standalone Transformer-based control. Subsequent work will validate its generalization capabilities across full-process cement production, with emphasis on long-term stability under complex coupled operational conditions.

5. Conclusions

The paper proposes the Trans-Tube-MPC method, which integrates the temporal modeling capabilities of Transformer with Tube robust constraints to mitigate control performance degradation caused by model-plant mismatches in industrial processes. The core innovation lies in establishing a dynamic compensation loop: the Transformer module employs multi-step attention mechanisms to predict the cumulative impact of disturbances on system states, enabling online correction of predictive model outputs, while the Tube constraints dynamically adjust control boundaries based on real-time errors to suppress overcompensation tendencies of deep learning models. Experimental results demonstrate that under extreme 60% model mismatch, the proposed method achieves a 58% reduction in RMSE compared to conventional MPC and operational safety. Notably, the hybrid architecture exhibits advantages in typical ±20% mismatch scenarios, reducing control errors by 21–37% compared to standalone Transformer-based control. Subsequent work will validate its generalization capabilities across full-process cement production, with emphasis on long-term stability under complex coupled operational conditions. Despite its advantages, the proposed method requires substantial offline training data and computational resources for Transformer inference. Future work will explore lightweight neural architectures, online adaptation strategies, and extended validation in multi-variable and large-scale industrial processes.

Author Contributions

J.C.: Writing—review editing, Writing—original draft, Conceptualization of this study, Methodology, Software. H.P.: Data curation, Writing—Original draft preparation, Investigation, Supervision. Z.X.: Writing, Software. F.Y.: Data curation, Writing, Software. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Key Research and Development Program of Heilongjiang Province (2024ZXDXA09).

Data Availability Statement

Data is unavailable due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Schwenzer, M.; Ay, M.; Bergs, T.; Abel, D. Review on model predictive control: An engineering perspective. Int. J. Adv. Manuf. Technol. 2021, 117, 1327–1349. [Google Scholar] [CrossRef]
  2. Hu, J.; Shan, Y.; Guerrero, J.M.; Ioinovici, A.; Chan, K.W.; Rodriguez, J. Model predictive control of microgrids–An overview. Renew. Sustain. Energy Rev. 2021, 136, 110422. [Google Scholar] [CrossRef]
  3. Darby, M.L.; Nikolaou, M. MPC: Current practice and challenges. Control Eng. Pract. 2012, 20, 328–342. [Google Scholar] [CrossRef]
  4. Schäfer, J.; Cinar, A. Multivariable MPC system performance assessment, monitoring, and diagnosis. J. Process Control 2004, 14, 113–129. [Google Scholar] [CrossRef]
  5. Sun, Z.; Qin, S.J.; Singhal, A.; Megan, L. Performance monitoring of model-predictive controllers via model residual assessment. J. Process Control 2013, 23, 473–482. [Google Scholar] [CrossRef]
  6. Kheradmandi, M.; Mhaskar, P. Model predictive control with closed-loop re-identification. Comput. Chem. Eng. 2018, 109, 249–260. [Google Scholar] [CrossRef]
  7. Drgoňa, J.; Arroyo, J.; Figueroa, I.C.; Blum, D.; Arendt, K.; Kim, D.; Ollé, E.P.; Oravec, J.; Wetter, M.; Vrabie, D.L.; et al. All you need to know about model predictive control for buildings. Annu. Rev. Control 2020, 50, 190–232. [Google Scholar] [CrossRef]
  8. Chen, Y.; Ierapetritou, M. A framework of hybrid model development with identification of plant-model mismatch. AIChE J. 2020, 66, e16996. [Google Scholar] [CrossRef]
  9. Xu, X.; Simkoff, J.M.; Baldea, M.; Chiang, L.H.; Castillo, I.; Bindlish, R.; Ashcraft, B. Data-driven plant-model mismatch estimation for dynamic matrix control systems. Int. J. Robust Nonlinear Control 2020, 30, 7103–7129. [Google Scholar] [CrossRef]
  10. Paulson, J.A.; Santos, T.L.M.; Mesbah, A. Mixed stochastic-deterministic tube MPC for offset-free tracking in the presence of plant-model mismatch. J. Process Control 2019, 83, 102–120. [Google Scholar] [CrossRef]
  11. Huusom, J.K.; Poulsen, N.K.; Jørgensen, S.B.; Jørgensen, J.B. ARX-model based model predictive control with offset-free tracking. Comput. Aided Chem. Eng. 2010, 28, 601–606. [Google Scholar] [CrossRef]
  12. Huang, R.; Biegler, L.T.; Patwardhan, S.C. Fast offset-free nonlinear model predictive control based on moving horizon estimation. Ind. Eng. Chem. Res. 2010, 49, 7882–7890. [Google Scholar] [CrossRef]
  13. Son, S.H.; Narasingam, A.; Kwon, J.S.I. Development of offset-free Koopman Lyapunov-based model predictive control and mathematical analysis for zero steady-state offset condition considering influence of Lyapunov constraints on equilibrium point. J. Process Control 2022, 118, 26–36. [Google Scholar] [CrossRef]
  14. Wang, J.; Ding, B.; Zhang, S. Multivariable offset-free MPC with steady-state target calculation and its application to a wind tunnel system. ISA Trans. 2020, 97, 317–324. [Google Scholar] [CrossRef]
  15. González, A.H.; Adam, E.J.; Marchetti, J.L. Conditions for offset elimination in state space receding horizon controllers: A tutorial analysis. Chem. Eng. Process. Process Intensif. 2008, 47, 2184–2194. [Google Scholar] [CrossRef]
  16. Han, H.G.; Wang, C.Y.; Sun, H.Y.; Yang, H.Y.; Qiao, J.F. Iterative learning model predictive control with fuzzy neural network for nonlinear systems. IEEE Trans. Fuzzy Syst. 2023, 31, 3220–3234. [Google Scholar] [CrossRef]
  17. Blaud, P.C.; Chevrel, P.; Claveau, F.; Haurant, P.; Mouraud, A. ResNet and PolyNet based identification and (MPC) control of dynamical systems: A promising way. IEEE Access 2022, 11, 20657–20672. [Google Scholar] [CrossRef]
  18. Lanzetti, N.; Lian, Y.Z.; Cortinovis, A.; Dominguez, L.; Mercangöz, M.; Jones, C. Recurrent neural network based MPC for process industries. In Proceedings of the 2019 18th European Control Conference (ECC), Naples, Italy, 25–28 June 2019; pp. 1005–1010. [Google Scholar]
  19. Wu, Z.; Rincon, D.; Christofides, P.D. Process structure-based recurrent neural network modeling for model predictive control of nonlinear processes. J. Process Control 2020, 89, 74–84. [Google Scholar] [CrossRef]
  20. Chu, J.Z.; Tsai, P.F.; Tsai, W.Y.; Jang, S.S.; Shieh, S.S.; Lin, P.H.; Jiang, S.J. Multistep model predictive control based on artificial neural networks. Ind. Eng. Chem. Res. 2003, 42, 5215–5228. [Google Scholar] [CrossRef]
  21. Lu, C.H.; Tsai, C.C. Generalized predictive control using recurrent fuzzy neural networks for industrial processes. J. Process Control 2007, 17, 83–92. [Google Scholar] [CrossRef]
  22. Zhang, T.; Feng, G.; Zeng, X.J. Output tracking of constrained nonlinear processes with offset-free input-to-state stable fuzzy predictive control. Automatica 2009, 45, 900–909. [Google Scholar] [CrossRef]
  23. Khodarahmi, M.; Maihami, V. A review on Kalman filter models. Arch. Comput. Methods Eng. 2023, 30, 727–747. [Google Scholar] [CrossRef]
  24. Urrea, C.; Agramonte, R. Kalman filter: Historical overview and review of its use in robotics 60 years after its creation. J. Sens. 2021, 2021, 9674015. [Google Scholar] [CrossRef]
  25. Feng, S.; Li, X.; Zhang, S.; Jian, Z.; Duan, H.; Wang, Z. A review: State estimation based on hybrid models of Kalman filter and neural network. Syst. Sci. Control Eng. 2023, 11, 2173682. [Google Scholar] [CrossRef]
  26. Transformer, C.G.P.; Zhavoronkov, A. Rapamycin in the context of Pascal’s Wager: Generative pre-trained transformer perspective. Oncoscience 2022, 9, 82. [Google Scholar] [CrossRef]
  27. Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 15908–15919. [Google Scholar] [CrossRef]
  28. Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef]
  29. Bi, Y.; Ji, Y. Parameter estimation of fractional-order Hammerstein state space system based on the extended Kalman filter. Int. J. Adapt. Control Signal Process. 2023, 37, 1827–1846. [Google Scholar] [CrossRef]
  30. Feng, Z.; Chen, J.; Xiao, W.; Sun, J.; Xin, B.; Wang, G. Learning Hybrid Policies for MPC with Application to Drone Flight in Unknown Dynamic Environments. arXiv 2024, arXiv:2401.09705. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.