Long Short-Term Memory Neural Networks for Modeling Dynamical Processes and Predictive Control: A Hybrid Physics-Informed Approach

This work has two objectives. Firstly, it describes a novel physics-informed hybrid neural network (PIHNN) model based on the long short-term memory (LSTM) neural network. The presented model structure combines the first-principle process description and data-driven neural sub-models using a specialized data fusion block that relies on fuzzy logic. The second objective of this work is to detail a computationally efficient model predictive control (MPC) algorithm that employs the PIHNN model. The validity of the presented modeling and MPC approaches is demonstrated for a simulated polymerization reactor. It is shown that the PIHNN structure gives very good modeling results, while the MPC controller results in excellent control quality.


Introduction
Model predictive control (MPC) algorithms, as highlighted in [1,2], find their primary applications in managing processes that classical control methods struggle to handle effectively.These processes often involve multiple-input, multiple-output (MIMO) systems or exhibit strong nonlinearity.MPC, renowned for its flexibility in accommodating various constraints, excels in ensuring high-quality control, even in the face of challenging processes.Real-world instances of successful MPC applications include control of chemical reactors [3,4] and distillation towers [5], as well as the integration of MPC in embedded systems controlling heating ventilation and air conditioning systems (HVAC) [6], quadrotors [7], fuel cells [8], autonomous vehicles [9], and underwater vehicles [10].
As emphasized in [11][12][13], accurate sensor measurements of essential process variables play a critical role in MPC.It is widely acknowledged that the absence of these measurements inevitably leads to a significant loss in control performance.To address this challenge, when the necessary measurements are not readily available, engineers commonly employ online estimation techniques, such as Kalman or extended Kalman filters ( [14]).Furthermore, specialized methods and strategies have been developed to tackle this issue in specific applications.In the domain of vehicles, innovative solutions have emerged.The authors of [15] introduce a real-world example where a vehicle employs an external camera to detect obstacles and lane positions on the road.Additionally, it utilizes external rear-corner radars to identify objects approaching from the rear.An intriguing application of sensors is presented in [16], where an anemometer measures external factors such as wind force and direction.Beyond the automotive sector, there are applications like sea ship depth measurement.In [10], a depth sensor is installed for precisely measuring sea ship depth, with heave speed derived from the depth sensor data.Finally, MPC is also used to manage fault-tolerant control.This application addresses issues like stiction in control valves, as discussed in [17].
The cornerstone of any effective MPC algorithm is the precision of its process model.Broadly, two general model classes are usually considered: first-principle (FP) models rooted in the fundamental understanding of the process; black-box approximations.Both model classes have their distinct strengths and limitations: • FP models demand meticulous process descriptions and accurate parameter values but offer unparalleled modeling precision across a wide operating range, even in abnormal situations.In practice, however, the values of some model parameters may be imprecise or unknown.
Neural networks models have proven to be very useful, especially when dealing with complex dynamical processes, such as predator-prey systems [28][29][30].However, black-box models may struggle when the available dataset lacks coverage for certain process variables, particularly those operating at infrequent points.
Physics-informed neural network models (PINNs) offer a compelling fusion of both modeling approaches.These models combine the foundational principles governing the process with the data-driven power of machine learning.The result is a versatile model that adheres to fundamental laws while approximating the behavior of real-world processes.The literature showcases PINN applications in scenarios where parameters of ordinary differential equations (ODE) models are either imprecisely known [31] or immeasurable [32].Furthermore, PINNs can approximate parameters of partial differential equations (PDE) [33].These PINN models find utility in replacing numerical solvers for ODEs [34] and even serve as models within MPC frameworks [35].Additionally, one can find several hybrid models aiming to combine a data-driven modeling approach with knowledge of physics.The hybrid physical guided neural network [36] is a feed-forward neural network integrated with a first-principles model.The entire hybrid model is trained jointly.This training process involves incorporating a fusion output layer that utilizes a straightforward interpolation technique.Other examples include using deep neural networks in a physically guided modeling approach [37], in modeling lithium batteries [38], and in modeling a traffic state [39].One can also find examples of introducing physics directly in the forward pass of the neural network to model the lake temperature [40].
This study addresses a common modeling challenge characterized by two specific limitations.Firstly, process-variable measurements are typically feasible but confined to a limited vicinity of certain operating points.Consequently, the resulting models exhibit localized validity, restricted to the regions where data have been collected for identification purposes.Secondly, although fundamentally sound, the existing first-principle models describing the process often lack precision due to imprecise parameters.In response to these limitations, this work introduces an innovative physics-informed hybrid neural network (PIHNN) model structure, leveraging LSTM neural networks.This approach combines elements from both first-principle and black-box data-driven methodologies, offering robust modeling capabilities in scenarios characterized by the aforementioned issues.Within this research, we delve into two data fusion techniques, drawing from the principles of the first-principle process description and the LSTM network, both employing a fuzzy-logic-based approach.The initial method employs a simplified data fusion block, while the subsequent method harnesses machine learning techniques to minimize overall model errors.To assess the effectiveness of the proposed model structure and data fusion techniques, we apply them to a benchmark polymerization reactor process.
Additionally, we integrate the developed PIHNN model into the MPC framework.Our analysis encompasses a straightforward MPC algorithm with a nonlinear optimization MPC (MPC-NO) and a more intricate linearization-based MPC scheme named the MPC algorithm with nonlinear prediction and linearization around the predicted trajectory (MPC-NPLPT), which relies on computationally uncomplicated quadratic optimization tasks.Our findings demonstrate that the linearization-based MPC approach can yield commendable control performance while significantly reducing computational demands compared to nonlinear counterparts.An initial iteration of the PIHNN model was introduced in conference proceedings [41], where a basic GRU neural network was employed.This current study represents a substantial expansion of previous research efforts.Here, we consider more general LSTM-based PIHNN models, comprehensively examine the model's structure, explore various potential variants and present details.Furthermore, we introduce an efficient model predictive control (MPC) algorithm for the PIHNN models considered in this study.
This work is organized as follows.Firstly, Section 2 presents the general structure and the details of the hybrid PIHNN model structure utilizing LSTM neural networks.The state-space modeling approach is employed.Secondly, Section 3 briefly describes the general MPC scheme with nonlinear optimization and presents general formulation, necessary implementation details, and the resulting quadratic optimization task of the linearization-based MPC method.Section 4 thoroughly studies the validity of as many as sic PIHNN model variants applied to approximate the behavior of a chemical reactor benchmark.Furthermore, the control efficiency and computational speed of the recommenced linearization-based MPC algorithm is shown.Finally, Section 5 concludes the article.

Hybrid Physics-Informed Models Using LSTM Neural Networks
We introduce an innovative PIHNN model that blends a data-driven approach with expert knowledge of the underlying physics of the process.To effectively apply the PIHNN model, the following conditions must be satisfied:

•
The process input and output variables, i.e., the manipulated and controlled variables, respectively, must be measurable.State variables may be measured or observed using a state estimation, e.g., in the form of an extended Kalman filter (EKF).

•
The FP process should exist in the form of a set of differential equations and, when necessary, additional algebraic relations based on the fundamental laws of physics governing the process.
However, we assume that the measurements and the FP model may exhibit imperfections.Specifically, the measurements may originate from a limited range within the entire spectrum of process variable variability.Furthermore, the FP model may also contain inaccuracies and be susceptible to errors arising from factors such as incorrect estimation of specific process parameters or measurement inaccuracies.

Model Structure
This paper primarily focuses on single-input single-output (SISO) process modeling.The process input and output are denoted as u and y, respectively.Additionally, the process has n x state variables, represented as x = [x 1 . . .
Figure 1 illustrates the model's overall structure.The PIHNN model is divided into three distinct components.The first model component, highlighted in blue, is entirely data-driven.It comprises n LSTM neural sub-models, each trained on available data.The number of data-driven sub-models corresponds to the number of distinct operational areas of the process from which measurement data can be collected.Each sub-model takes the vector X i LSTM as the input and generates the scalar y i LSTM as the output.LSTM networks are employed in this study, as earlier research has demonstrated their exceptional ability to model dynamical processes [23,27].However, it is important to note that alternative data-driven models could also be applied in this context.The second component of the PIHNN structure, highlighted in green, is rooted in expert knowledge about the underlying physics of the process.It consists of an FP sub-model formulated using ordinary differential equations.The input to this sub-model is the vector X FP , while the output is denoted by the scalar y FP .The third component of the PIHNN structure, highlighted in orange, represents the data fusion block (DF).In general, many decision models can be used here, such as neural networks of various architectures.However, we recommend using the fuzzy data fusion block (Fuzzy DF) because it directly incorporates the sub-models.There is no need to train Fuzzy DF on data, which is particularly useful when training data is lacking across specific ranges of process variable variability.By selecting membership function shapes, one can determine which areas and to what extent we should consider the sub-models when calculating the overall PIHNN model output.The DF block takes output calculated by all LSTM sub-models and the FP models as inputs.Based on the current operating state of the process, represented by the vector X DF , it makes decisions regarding the combination of outputs from all sub-models.The primary goal of this fusion process is to minimize the overall error of the entire PIHNN model.

First-Principle Sub-Model
Typically, the FP model utilizes fundamental physical laws formulated in the continuous-time domain, i.e., a set of differential equations must be considered.The state equations have the classical form ẋ1 (t) = f 1 (x 1 (t), . . ., x n x (t), u(t)) (1) . . .
while the output equation is where f 1 , . . ., f n x : R n x +1 → R and g : R → R are nonlinear functions.Since we will next use the PIHNN model relying on the FP model in the MPC algorithm with online linearization, we require the functions f 1 , . . ., f n x , g to be differentiable.From Equations (1)-(3), we can find a corresponding discrete-time FP model where f d 1 , . . ., f d n x : R n x +1 → R and g d : R → R are nonlinear mapping functions.The input vector to the FP model can be expressed as

LSTM Sub-Model
LSTM networks were developed in response to the vanishing gradient problem that impacts traditional recurrent neural networks [42].Each LSTM neuron is referred to as a "cell" (Figure 2) and encompasses gates responsible for governing the flow of information within the network.The LSTM cell comprises four distinct gates:  Each cell in the network has its input vector expressed as where parameters n A and n B define the order of dynamics of the model.The LSTM network has n N cells.The weights in the network can be written in a matrix form The input weight matrices W i , W f , W g and Subsequently, the cell state of the network can be computed where the symbol • represents the Hadamard product of vectors.Finally, the hidden state can calculated The LSTM layer of the network is typically added to a fully connected layer (Figure 3), with weight matrix W y with a dimensionality of 1 × n N and bias b y .Finally, the computation of the network's output at time instant k can be expressed as One can represent Equations ( 10)- (15) in scalar form, which will prove useful for the derivation of the MPC algorithm considered in Section 3. The scalar form expressions for the n-th elements of the gate and state vectors are Equations ( 22) and ( 23) could be used to find the output of the network in the form of one equation

Fuzzy Data Fusion Block
Considering Figure 1, the output of the whole PIHNN model is In this study, we use trapezoidal, sigmoidal, and Gaussian membership functions.For trapezoidal functions, we have for sigmoidal ones, we write and for Gaussian ones, we define The signal defines the current operating point of the process.Parameters a n , b n , c n , d n define the shape of membership functions used.

Model Development Procedure
The process for establishing the PIHNN model unfolds as follows: 1.
We determine the number of distinct training datasets that can be derived from the process measurements.

2.
We conduct training of the LSTM network for each training dataset.

3.
We implement a discrete FP model of the process.

4.
We select the initial shape and range of the membership function within the DF block. 5.
We deliver the outputs of the LSTM sub-models and the output of the FP model as inputs of the DF block, where their fusion is carried out based on the current operational state of the process.This fusion process determines the output of the PIHNN model.

6.
We assess the quality of PIHNN modeling.If it proves unsatisfactory, then it becomes necessary to modify the shape of the membership function.7.
We adjust the membership function's shape, which can be executed manually, drawing upon expert knowledge, or using an optimization procedure.
The flow chart of the model development procedure is also presented in Figure 4.

Basic Predictive Control Problem Formulation
This work utilizes the general MPC formulation [1,2].Namely, at each discrete-time sampling instant k, where k = 0, 1, 2, . .., the MPC controller performs real-time calculations to determine the vector of decision variables.It is defined as the following current and future increments of the input variable The symbol u(k|k) represents the increment of the manipulated variable at time instant k, computed at the same time instant k.Similarly, the symbol u(k + 1|k) corresponds to the increment of the manipulated variable at the future time instant k + 1, computed at the current time instant k.This notation extends to subsequent time instants as well.N u represents the control horizon, which determines the length of the MPC decision variable vector.The fundamental MPC optimization problem aims to minimize the predicted control error, minimize excessive increments of the manipulated variable, and satisfy constraints.Let us denote the set-point of the controlled variable for the future sampling instant k + p known at the current instant k by y sp (k + p|k) and the corresponding prediction determined from the process model by ŷ(k + p|k).We consider the predictions and control errors on the prediction horizon N. As far as the magnitude constraints on the manipulated variable and the predicted controlled variable are concerned, they are represented by u min , u max and y min , y max , respectively.The fundamental MPC optimization task can be formulated as follows: In general, the predictions over the prediction horizon are obtained as where the model output for the future discrete time k + p, determined at the current time k, is denoted as y(k + p|k).The unmeasured disturbance, that covers the model error and real disturbances that act on the controlled process, is computed as the difference between the measured value of the process controlled variable and its estimation obtained from the model.The MPC optimization problem (30) is solved online at each sampling instant, yielding the solution vector (29).According to the principle of repetitive control, the first element of the obtained solution vector is sent to the process and the whole procedure is repeated in the subsequent sampling instants.

Nonlinear MPC Optimization for PIHNN Models
Suppose a nonlinear model, e.g., an LSTM structure or the PIHNN model described in this work, is directly used to determine the predictions ŷ(k + p|k).The general MPC optimization problem (30) becomes nonlinear in that case.We will refer to such a control method as MPC-NO.

Quadratic MPC Optimization for PIHNN Models
In order to derive a computationally attractive alternative to the MPC-NO method, we derive an MPC with successive linearization of the predicted trajectory.Such an approach will make it possible to derive a quadratic optimization MPC task.We use the general approach to predicted trajectory linearization known as the MPC-NPLPT method, introduced in [19,20].However, the application of an original PIHNN model structure requires careful derivation of the algorithm.Firstly, let us define the predicted trajectory of the controlled variable over the entire prediction horizon, i.e., the following vector: In the MPC-NPLPT approach, linearization is performed along a trajectory of the manipulated variable defined over the control horizon.It has the following form: From the definition of the control horizon, it follows that u traj (k + p|k) = u traj (k + N u − 1|k) for p = N u , . . ., N. The input trajectory (33) is utilized to determine the predicted trajectory of the controlled variable over the prediction horizon For linearization, we use Taylor's approach.Let us define the vector comprising the current and future values of the manipulated variable that correspond to the MPC decision variable vector ( 29) Taking advantage of the compact vector-matrix notation, the predicted trajectory, ŷ(k), is expressed as the following linear function of the vector (35): The N × N u matrix defines partial derivatives of the predicted controlled variable's trajectory with respect to the future manipulated variable's trajectory; both trajectories take into account the linearization conditions, so we have to utilize the trajectories ŷtraj (k) and u traj (k), respectively.The entries of the matrix H(k) are for all predictions over the prediction horizon, i.e., p = 1, . . ., N, and all computed values of the manipulated variable over the entire control horizon, i.e., r = 0, . . ., N u .The link between the vectors u(k) and u(k) is when the entries of the N u × N u auxiliary matrix J are defined as and the vector of length N u is Using the linearized trajectory (36) and the rule (39), the general predictive control optimization task (30) is transformed to the subsequent quadratic optimization problem, as follows: The definitions for all necessary symbols used in the above problem are • Λ: a diagonal N u × N u matrix with diagonal entries equal to the weighting coefficient λ; • u min : a vector of length N u , where all elements are equal to u min ; • u max : a vector of length N u , where all elements are equal to u max ; • u min : a vector of length N u , where all elements are equal to u min ; • u max : a vector of length N u , where all elements are equal to u max ; • y min : a vector of length N, where all elements are equal to y min ; • y max : a vector of length N, where all elements are equal to y max .

PIHNN Prediction
Let us now discuss how the PIHNN model discussed in this work is utilized for MPC prediction, i.e., to calculate the predicted trajectory of the controlled variable defined by Equation (34).We use Equation (25) for the future time instant k + p which gives Taking advantage of Equation ( 31), the predictions are, therefore, expressed as where the membership functions are defined by Equations ( 26), ( 27) or (28).Let us note that the predicted trajectory from the PIHNN model depends on the trajectories generated by both LSTM and FP sub-models.The disturbance (the prediction error) is determined as the difference between the measured process output and its estimation obtained from the model where the signal y PIHNN (k) is found from Equation (31).

LSTM Model Prediction
For each LSTM sub-model, the calculations start with computing the predicted output of the gates.For this purpose, we use Equations ( 17)- (20) which yield the following Let us introduce auxiliary integer variables as follows: We can represent gate predictions as and Then, the predicted cell and hidden states can be determined from Equations ( 21) and ( 22) Let us stress that the above equations have to be used recurrently for p = 1, . . ., N. Finally, the predicted output of the i-th LSTM sub-model can be computed from Equation ( 23) which can be also expressed as ŷLSTM

FP Model Prediction
Using Equations ( 4) and ( 5), we find model states and the output for the future time instant k + 1 (58) . . .
To simplify the following calculations, let us start with computing the prediction of the states for the time instant k + p From Equation ( 6), we find the corresponding predicted controlled variable: Next, we can determine the predictions for the subsequent sampling instants: where p = 2, . . ., N. The state and output disturbances (prediction errors), respectively, are computed as the measurements compared with the outputs of the corresponding model equations

PIHNN Model Derivatives
The entries of the matrix H(k) (Equation ( 37)) are computed from Equation (38).Differentiation of Equation (44) yields Let us note that the derivatives of the whole PIHNN model depend on the LSTM and FP sub-model derivatives.

LSTM Model Derivatives
Derivatives for LSTM sub-models are calculated by differentiating Equation (57) For all p = 1, . . ., N and r = 0, . . ., N u − 1, the subsequent step involves the application of the chain rule of differentiation.Initially, it is imperative to determine the derivatives of gates i, f , g, and o.We proceed to differentiate Equation (50): Equation ( 51) gives from Equation (52), we obtain Finally, using Equation (53), we derive The following step involves computing the derivative of the cell state c using Equation (54) and from Equation (54), we can derive the derivatives of the hidden state h

FP Model Derivatives
We start by finding derivatives of the predicted state variables for the sampling instant k + 1. Differentiating Equations ( 61) and (62), we obtain we can simplify Equations ( 78) and (79) to . . .
The next step is to find the derivative for the FP model states and the controlled variable for prediction at the sampling instant k + p, where p = 2, . . ., N. From Equation (63), we have Next, we can determine the derivatives when p = 2, . . ., N. We start with the state variables.From Equations ( 64) and (65), we obtain . . .
Finally, we can find the predictions of the FP sub-model output using Equation (66)

Polymerization Process Description
The process under study is a polymerization reactor [43] that is frequently used as a benchmark to assess the usefulness of models and control methods, e.g., [20,27].This process is characterized by a single input, representing the initiator's flow rate, denoted as F I (m 3 h −1 ).Likewise, it has a single output, the number average molecular weight (NAMW) (kg kmol −1 ).Both input and output signals have been appropriately normalized to facilitate the training of neural networks.The scaling is defined as follows: u = 100(F I − F I ) and y = 0.0001(NAMW − NAMW).The values at the nominal operating point are F I = 0.016783 and NAMW = 20,000.The polymerization process operates with a sampling time T = 1.8 seconds.
Let us note that the predictions determined from the LSTM sub-models are universal, as derived in Section 3.5.Similarly, let us note that the derivatives matrix determined from the LSTM sub-models are universal, as derived in Section 3.8.Hence, it is only necessary to derive specific equations for prediction using the specific first-principle model of the process.Next, we have to derive equations for the derivatives matrix.

First-Principle Model for Polymerization Process and Its Use in MPC
The continuous-time first-principle model of the polymerization process [43] is discreticized using the Euler method.The discrete-time model has the following form: where the model parameters are: It is important to note that to emulate the imperfections and inaccuracies of the FP model, we introduced a 20 percent increase to the gain of the model during the simulation experiments, i.e., For the PIHNN model used in our MPC algorithm, we have to derive equations for the prediction using the specific FP model of the considered benchmark system and the general rules formulated in Section 3.6.They will allow to calculate the predicted trajectory ŷtraj (k), as defined by Equation (34).We start with determining prediction equations when p = 1.From Equations ( 87)-(91), we obtain The state and output disturbances are derived from the general Equations ( 67)-( 69), respectively, which gives Next, we find the equations for state and output predictions for p = 2, . . ., N Using the above predictions generated by the FP model, we have to determine derivatives of the predicted trajectory of the controlled variable with respect to the trajectory of the manipulated variable, i.e., the derivative matrix H(k), as defined by Equation (38).For this purpose, we use the general rules formulated in Section 3.9.We consider Equations ( 81) and ( 82) and we obtain Equation ( 83) allows us to express the output derivatives as Finally, we use Equations ( 84)-(86) to determine the state variable and output derivatives, respectively for all p = 2, . . ., N and r = 0, . . ., N u − 1.

LSTM Model for Polymerization Process
Two separate training datasets have been collected from the simulated process (i.e., from simulation of the continuous-time first-principle models), for different operating conditions, as follows: 1.
dataset 2 has been collected for 0.05 < F I < 0.06 which results in 1.41 × 10 4 < NAMW < 1.54 × 10 4 .The datasets were then used to train two LSTM models, denoted thereafter as LSTM1 and LSTM2.Both models have been trained with the same parameters: • the number of neurons n N = 7; • the order of dynamics n A = 0, n B = 1.LSTM models have been trained in MATLAB on a PC equipped with an Nvidia GeForce 970 GTX GPU, an Intel i5-3450 CPU and 16 GB of RAM.We have employed the Adam optimization algorithm with a learning rate of 0.001 and a maximum of 1000 training epochs.

Modeling Quality of LSTM and FP Models
The modeling quality of all sub-models developed for the polymerization process can be compared in Figure 5.In this comparison, we can see the individual outputs of all sub-models when operating independently with the test dataset.LSTM1, trained predominantly with data featuring large NAMW values, unsurprisingly demonstrates exceptional performance when dealing with such high NAMW values.However, the model's capability to provide correct outputs diminishes when it encounters data not present in the training dataset.Conversely, LSTM2, trained with low NAMW values, excels when the NAMW values are indeed low.However, it exhibits subpar performance when attempting to model high NAMW values.Notably, in the FP model with the increased gain performs poorly across the entire range of NAMW values.The membership functions are depicted in Figure 6.Our understanding of the submodels has guided the initial choices of these shapes.The plots display fuzzified variable values along the horizontal axis, specifically representing the NAMW output of the polymerization reactor.Along the vertical axis, one can find the membership function values.Each membership function corresponds to a particular model.LSTM1, which was trained on data with large NAMW values, is most effective when dealing with large NAMW values.The blue membership functions on the plot indicate the range of NAMW values for which prioritizing the use of the LSTM1 model is recommended.LSTM2, characterized by yellow membership functions, is best suited for NAMW values close to the data in its training set, which primarily includes small values of NAMW.In scenarios where NAMW values fall outside the data ranges of both training sets, the most reliable choice is to utilize the FP model, represented by orange membership functions.Once the initial shapes have been determined, the subsequent step involves utilizing an optimization procedure to fine-tune these shapes.The procedure starts with initial membership function shapes, using Levenberg-Marquardt to minimize the overall error of the PIHNN model.

PIHNN Modeling Quality
The results of the polymerization reactor modeling experiments are presented in Figures 7-9.These figures illustrate the initial 1500 steps of the simulation.Each figure showcases the outputs of two PIHNN models: one with the initial membership function shapes (orange) and the other with optimized (yellow) membership function shapes.These results are compared to the data from the test set.Figure 7 presents the use of the most straightforward decision blocks with trapezoidal membership functions.Even this simplest approach enables the PIHNN model to outperform individual sub-models.The initial shape of the membership function allows the PIHNN structure to represent the data effectively for both small and large values of NAMW.In cases with intermediate values of NAMW, the PIHNN model averages the outputs of the sub-models, while model output still exhibits some deviation from the test data, there is a clear improvement over the FP model.The model with a tuned shape has lower error overall; however, it tends to have poorer modeling quality for both large and small values of NAMW in comparison to the LSTM sub-models.Finally, Figure 9 presents the utilization of Gaussian membership functions in a DF PIHNN model.Here, one can observe that the Gaussian decision model tends to average the values of the three sub-models across the entire spectrum of NAMW variability.This effect is particularly evident in the model with the initial shape of the membership function, where, for large values of NAMW, the model noticeably diverges from the data.As a result, for large NAMW values, PIHNN gives worse results than the independent LSTM1 submodel.Low and intermediate NAMW values are subject to much lower modeling errors.Although optimizing the shape mitigated this averaging effect somewhat, the model's output still exhibits relatively large errors.

Validation of MPC Algorithms Using PIHNN Models
The PIHNN model, in six different versions, has been implemented in MPC algorithms.We compare the results obtained from two types of controllers: one with nonlinear optimization (MPC-NO) and the second one recommended in this work, involving linearization around the prediction trajectory (MPC-NPLPT).Table 1 compares the control errors determined for these controllers.First, it is worth noting that the best control quality is achieved for models utilizing DF with Gaussian function shapes.Models employing trapezoidal functions exhibited slightly higher errors, while the poorest performance was observed in models with sigmoidal-shaped functions.This observation may seem counterintuitive, considering that models with sigmoidal membership functions have smaller modeling errors compared to models with Gaussian ones.It is important to stress that the shape of the closed-loop output trajectory with the MPC controller is affected not only by the quality of the model used but also by the feedback mechanism.Even though Gaussian models exhibit a higher error rate, their inherent averaging characteristic enhances the performance of the MPC controller when coupled with feedback.Secondly, Table 1 demonstrates that the MPC-NPLPT controller generally yields slightly higher error values than the MPC-NO one when utilizing the same PIHNN model for prediction.This result is not surprising, as MPC-NPLPT employs a linearized model.During linearization, some of the information present in the nonlinear model is simplified or lost.The exception here is PIHNN model ver.5, where MPC-NPLPT algorithm provides better controller performance.This may be attributed to chance where the simplifications happened to benefit the controller's performance in this specific case.However, it is worth noting that the error differences between MPC-NO and MPC-NPLT controllers are minimal for each type of PIHNN model, and both types of controllers work very well.Table 2 compares the average time required by each MPC controller for control calculations.The computations have been conducted on a PC, and since it is not a real-time system, results may vary on different PCs.Therefore, the results are presented as percentages.The longest time recorded for MPC-NO with PIHNN model ver.3, which amounted to 140 ms, is considered as 100%.The table reveals that the implementation of the online linearization-based MPC controller significantly reduced the calculation time required, resulting in a 4-5 times decrease compared to nonlinear controllers.The results are also visually presented.In Figure 10, one can observe the performance of the MPC algorithm with a DF employing trapezoidal functions.The output signals for the PIHNN model with the initial function shape are swift without overshoot for both low and high values of NAMW.However, for intermediate NAMW values, there is a slightly larger overshoot, and the settling time is extended.The signals are quite similar in the case of DF with a tuned function shape, but there is a greater overshoot for intermediate NAMW values.Additionally, it is worth noting that the results obtained for the MPC-NO controller are practically indistinguishable from those for the MPC-NPLT one.
Figure 11 illustrates the results for sigmoidal membership functions.Here, we can observe that the overshoot becomes more pronounced, particularly for intermediate NAMW values, especially when considering the set-point NAMW sp = 2.5 × 10 4 .The final Figure 12 displays the results of applying Gaussian membership functions.These results are characterized by the shortest settling time and the smallest overshoot.Notably, the controller exhibits excellent performance for average values of NAMW.This observation leads to the conclusion that the averaging nature of Gaussian functions, as seen earlier in the modeling phase (Figure 9, positively impacts the controller's performance when using the model in the MPC scheme.For NAMW values within the range of 2 × 10 4 to 3 × 10 4 , the FP model significantly impacts PIHNN performance.As mentioned, the FP model is imperfect, featuring an increased gain of 20%.

Conclusions
This work defines a new PIHNN model structure that combines the first-principle process description and data-driven neural sub-models using a specialized data fusion block that relies on fuzzy logic.We consider a very practical case when the available first-principle model is imperfect and the data cannot be measured in the complete range of process operation.By combining an imperfect physical model with data obtained from an incomplete range of operations, we have developed a hybrid model that significantly improves performance across the entire range of signal variability.Secondly, this work develops a computationally efficient MPC controller for the PIHNN model.We show the efficacy of the PIHNN model and the resulting MPC controller for a simulated polymerization benchmark.We study the efficiency of different data fusion fuzzy blocks and their impact on model accuracy.We recommend tuning, i.e., optimizing the fuzzy membership functions, greatly improving model accuracy.Finally, we show that the described MPC controller based on the PIHNN model gives excellent results.Namely, the obtained control quality is very similar to that possible in MPC relying on nonlinear optimization while its calculation time is a few times shorter.In our future work, we plan to develop a methodology for designing PIHNN structures tailored to processes with multiple inputs and outputs.Additionally, it is interesting to check the impact of employing various decision model types within the data fusion block on PIHNN modeling quality.

Figure 1 .
Figure 1.General structure of the PIHNN model.
R g and R o have dimensionality n N × n N ; and the bias vectors b i , b f , b g and b o have dimensionality n N × 1, respectively.At time instant k, the LSTM model initially calculates the output value of each gate

Figure 3 .
Figure 3. Structure of the whole LSTM network.

Figure 4 .
Figure 4. Flow chart for development of PIHNN model.

Figure 5 .
Figure 5.A total of 1000 samples of the validation dataset vs. outputs of two local LSTM sub-models and the FP model with an incorrect gain.4.5.Development of PIHNN ModelsOnce all the sub-models have been prepared, the next step to design the PIHNN model is to develop the DF block.Various membership function shapes have been tested, i.e.:• PIHNN model ver.1-initial trapezoidal functions; • PIHNN model ver.2-optimized trapezoidal functions; • PIHNN model ver.3-initial trapezoidal functions; • PIHNN model ver.4-optimized trapezoidal functions; • PIHNN model ver.5-initial trapezoidal functions; • PIHNN model ver.6-optimized trapezoidal functions.

Figure 8 Figure 7 .Figure 8 .
Figure 7.A total of 1000 samples of the validation dataset vs. the output of initial and optimized fuzzy PIHNN structures with trapezoidal MFs (PIHNN models ver. 1 and ver.2).

Figure 9 .
Figure 9.A total of 1000 samples of the validation dataset vs. the output of initial and optimized fuzzy PIHNN structures with Gauss MFs (PIHNN models ver. 5 and ver.6).

Forget gate Input gate State candidate gate Output gate
+

Table 1 .
Control errors of MPC algorithms with different PIHNN models.

Table 2 .
Average execution time of MPC algorithms with different PIHNN models.