Parameter Identiﬁcation Concept for Process Models Combining Systems Theory and Deep Learning

: In recent years, dynamic process models have grown even more important in the context of Industry 4.0 and the use of digital twins. However, the accuracy of the corresponding model parameter estimates is determined by the quantity and quality of data and the parameter identiﬁcation solving methodologies used. Standard methods are based on the ordinary least squares framework. Still, other options are available that might be more sensitive to model parameter variations and ensure more precise parameter estimates. The paper presents a novel technique for parameter identiﬁcation based on incorporating neural ordinary differential equations for surrogate modeling and differential ﬂatness, i.e., a systems theory concept in control engineering. This approach may lead to improved parameter sensitivities, as demonstrated with a simulation study of a distributed-parameter identiﬁcation problem assuming a diffusion-type parabolic partial differential equation.


Introduction
System identification is the process of creating a mathematical model, or equation, to represent a real-world problem. This equation can then be used to predict and analyze possible outcomes of the system under study. In many engineering fields, the time behavior of complicated technical systems can be described by using a system of ordinary differential equations (ODEs). However, the parameters in these equations are often unknown and need to be estimated from experimental data. Over the past few decades, there has been intense research on parameter estimation methods. A popular method minimizes the sum of squared errors (SSE) between a model prediction and measurement data, where the prediction is calculated by solving the ODE numerically [1]. The model parameters are then adjusted until a given minimization criterion is reached. However, other options are available that might be more sensitive to model parameter variations and might ensure more precise parameter estimates, respectively. Control and systems theory can improve parameter identification procedures, e.g., online parameter identification concepts [2,3].
Another example is differential flatness to recalculate control trajectory profiles for desired system dynamics [4,5], i.e., following a system inversion concept. In a differential flat system, state variables and input variables can be expressed as functions of so-called flat outputs and a finite number of their derivatives, also leading to a reformulation of the parameter identification problem. Moreover, while optimal experimental design concepts might be needed to improve data quantity and quality [3], the flatness concepts involved Eng. Proc. 2022, 19, 27 2 of 7 in the flat output approach may result in improved parameter sensitivities [6] and more precise parameter estimates [7,8] without experimental data enrichment.

Parameter Identification Problem
Frequently, dynamic process models are given as ordinary differential equation systems: where t ∈ [t 0 , t 0 + t end ] is the time, with t 0 as the initial time and t end as the time duration of the simulation, u ∈ R n u is the vector of the control variables, p ∈ R n p is the vector of the time-invariant parameters, and x ∈ R n x are the differential system states. The initial conditions for the differential states are given by x 0 . Moreover, f : R n x ×n u ×n p → R n x represents the corresponding vector field. For this kind of mathematical representation, the standard approach of parameter identification, i.e., the ordinary least squares (OLS) method, can be defined as: where || · || 2 denotes the Euclidean norm, y data (t k ) represents the data vector at discrete time points t k over all measurement samples K, and the model output function is defined as: with h : R n x → R n y , and y ∈ R n y as the model output vector. Alternatively, when aiming to utilize the inverse model response, i.e., applying a model inversion strategy, an input least squares (ILS)-based parameter identification problem can be used: Here, the control inputs, u(t k , p), have to be calculated to solve the parameter identification problem, and u data (t k ) represents the recorded physical input actions. For this purpose, we study the differential flatness concept outlined in Section 2.2. However, it is essential to note that parameter sensitivities are relevant for well-posed parameter identification problems. On having the output function y(t k , p) and the inputs u(t k , p) (inverse model response), the sensitivity S of the parameter p is defined as: Here, in general, absolute high parameter sensitivity values ensure precise parameter estimates according to the Fisher Information matrix and the Cramér Rao inequality [1,3].

Differential Flatness
In literature, a process model (Equation (1)) is called differentially flat if there is an output function: with a finite value s ∈ N and the smooth mapping function y flat : R n x × (R n u ) s+1 × R n p → R n y that is called flat output. With the flat output, the system states and control inputs are expressed as: with the mapping functions Ψ x : (R n y ) r+1 × R n p → R n x and Ψ u : (R n y ) r+2 × R n p → R n u , and assuming a quadratic system dimy flat = dimu. When applying the flatness concept, it was shown that parameter sensitivities and the reliability of parameter estimates, respectively, could be improved in the case of ILS [6] or when combining the OLS with ILS [7,8]. However, in process systems engineering, for instance, besides lumped-parameter systems (i.e., ordinary differential equations), distributed-parameter systems, described via partial differential equations, are frequently applied. In this case, the differential flatness approach has to be generalized [5,[9][10][11]].

Neural Ordinary Differential Equations
In data science and deep learning, neural networks are frequently used to build empirical models. A neural network is a group of interconnected neurons with one or more hidden layers depending on the network's specific task. Technically, the i th neural network layer, Thus, for instance, a feed-forward neural network reads as: When it comes to the so-called neural ordinary differential equations, the governing equations read as: Neural ODEs offer a promising approach for hybrid modeling and system identification [12][13][14][15]. Furthermore, the neural network's architecture can be optimized to represent experimental data better. This could be performed in conjunction with optimal experimental design methods [3] to improve the accuracy of system identification further.
Following the numerical solution and the finite difference method (Equation (13)) of the diffusion-type PDE, a set of coupled ODEs is obtained, which can be written in the state-space form as shown in Equation (14).
Practically, the parameter identification problem for this academic case study is to determine the diffusion parameter p in this system to represent the actual physical process being modeled accurately. This can be challenging as even minor changes in the coefficient value can result in significant changes in solution behavior or vice versa. However, it is often possible to obtain reasonable estimates for the diffusion parameter with careful analysis and experimentation [2,18], including the proposed concept of combining systems theory with differential flatness and deep learning. In particular, when assuming that all states in Equation (14) are measurable, i.e., y i = φ i , ∀ 1 ≤ i ≤ N, and that the output derivatives, i.e., . y i , ∀ 1 ≤ i ≤ N, exist, then the related equation system (Equation (15)) can be transformed to determine the input variables, u i , ∀ 1 ≤ i ≤ N, accordingly.
· · · · · · · · · · · · · · · · · · · · · 0 0 0 · · · 0 0 0 1 0 · · · 0 0 0 0 1 · · · 0 0 · · · · · · · · · · · · 0 0 0 · · · 1 0 0 0 0 · · · 0 1 Moreover, when dealing with measurement data instead of the full model system (Equation (15)), the neural ODE framework (Equation (11)) is used as a surrogate model to approximate the output functions and their derivatives. In particular, the following multilayer perceptron (MLP) setting is used: two hidden layers with 400 nodes each; input and output layer with 198 nodes each; tanh as activation function. To train the resulting neural ODE system, simulated data with t i+1 − t i = 0.1, ∆x = 0.01, t ∈ [0, 2], x ∈ [0, 1], Φ(t 0 , x) = sin(2πx), Φ(t, x = 0) = 0, Φ(t, x = 1) = 0, and p = 0.1 are used in dimensionless form. Figure 1a shows the model response without any distributed control action, i.e., u(t) = 0. The diffusion effect can be seen very clearly, as the initial differences along the spatial axis at the start time y(t 0 , x) decrease over the simulation time. Accordingly, a different diffusion parameter p would lead to a different degradation profile, reflecting the corresponding sensitivity of the model. Note that this parameter sensitivity allows, in principle, a practical identification of the model parameter when applying OLS with experimental data and Equation (2). Alternatively, the differential flatness approach can be used to impose desired process behavior. To this end, the necessary but parameterdependent calculated input variables can be used for parameter estimation following the mentioned ILS concept and Equation (4). In Figure 1b, for example, an output profile can be seen in which there are no changes over time in the course of the simulation. Please note that the corresponding input profile to achieve the desired control was determined using the flatness concept combined with the neural ODE system and the specified training setting.
using the flatness concept combined with the neural ODE system and the specified training setting. The reconstructed input and the generated output data from Figure 1b were used for the sensitivity analyses. Similar to the output profile, the calculated input profile depends on the diffusion parameter, and thus, is sensitive to its parameter variation. The sensitivities of the output data and the generated control input, corresponding to a variation of the diffusion parameter , are analyzed using Equations (5) and (6). The resulting sensitivity plots are shown in Figure 2. Here, the output parameter sensitivity (Figure 2a) is zero at the starting time, and its absolute values increase at x = 0.25 and x = 0.75, respectively. In the case of the input parameter sensitivity (see Figure 2b), these sensitivities are at their peak from the very starting time, and their absolute values are 3-4 times higher than the output parameter sensitivities. Moreover, as mentioned in Section 2, higher parameter sensitivities, in turn, imply better parameter estimates. Consequently, it could be comfortably said that the parameter estimation could be better performed using the control input (based on ILS) generated by combining the flatness property with the neural ODE concept.  The reconstructed input and the generated output data from Figure 1b were used for the sensitivity analyses. Similar to the output profile, the calculated input profile depends on the diffusion parameter, and thus, is sensitive to its parameter variation. The sensitivities of the output data and the generated control input, corresponding to a variation of the diffusion parameter p, are analyzed using Equations (5) and (6). The resulting sensitivity plots are shown in Figure 2. Here, the output parameter sensitivity (Figure 2a) is zero at the starting time, and its absolute values increase at x = 0.25 and x = 0.75, respectively. In the case of the input parameter sensitivity (see Figure 2b), these sensitivities are at their peak from the very starting time, and their absolute values are 3-4 times higher than the output parameter sensitivities. Moreover, as mentioned in Section 2, higher parameter sensitivities, in turn, imply better parameter estimates. Consequently, it could be comfortably said that the parameter estimation could be better performed using the control input (based on ILS) generated by combining the flatness property with the neural ODE concept. Eng. Proc. 2022, 19, 27 5 of 7 using the flatness concept combined with the neural ODE system and the specified training setting. The reconstructed input and the generated output data from Figure 1b were used for the sensitivity analyses. Similar to the output profile, the calculated input profile depends on the diffusion parameter, and thus, is sensitive to its parameter variation. The sensitivities of the output data and the generated control input, corresponding to a variation of the diffusion parameter , are analyzed using Equations (5) and (6). The resulting sensitivity plots are shown in Figure 2. Here, the output parameter sensitivity (Figure 2a) is zero at the starting time, and its absolute values increase at x = 0.25 and x = 0.75, respectively. In the case of the input parameter sensitivity (see Figure 2b), these sensitivities are at their peak from the very starting time, and their absolute values are 3-4 times higher than the output parameter sensitivities. Moreover, as mentioned in Section 2, higher parameter sensitivities, in turn, imply better parameter estimates. Consequently, it could be comfortably said that the parameter estimation could be better performed using the control input (based on ILS) generated by combining the flatness property with the neural ODE concept.
(a) (b) Figure 2. Sensitivity of the diffusion parameter p: (a) using OLS to define the parameter identification problem; (b) using ILS to define the parameter identification problem. Figure 2. Sensitivity of the diffusion parameter p: (a) using OLS to define the parameter identification problem; (b) using ILS to define the parameter identification problem.

Conclusions
Parameter identification is a fundamental problem in systems and control theory. This work successfully demonstrated that a parameter identification problem, which evaluates input least squares (ILS) instead of ordinary least squares (OLS), results in different parameter sensitivities and, in this particular case, an improved parameter sensitivity range. Here, Eng. Proc. 2022, 19, 27 6 of 7 our original contribution is the proper combination of advanced systems theory concepts (i.e., differential flatness) and recent developments in data science with neural ordinary differential equations. We applied our method to synthetic data generated. Here, we showed that ILS and related parameter sensitivities lead to a significantly higher parameter sensitivity range of the diffusion parameter than OLS-related parameter sensitivity. The improved parameter sensitivity range suggests that ILS may result in better parameter estimates than OLS but might critically depend on the neural ODE system setting and data quality-aspects addressed in ongoing research. Future work will also focus on advanced model inversion schemes which are not limited to differential flat systems.