Autoencoder-Based Reduced Order Observer Design for a Class of Diffusion-Convection-Reaction Systems

: The application of autoencoders in combination with Dynamic Mode Decomposition for control (DMDc) and reduced order observer design as well as Kalman Filter design is discussed for low order state reconstruction of a class of scalar linear diffusion-convection-reaction systems. The general idea and conceptual approaches are developed following recent results on machine-learning based identiﬁcation of the Koopman operator using autoencoders and DMDc for ﬁnite-dimensional discrete-time system identiﬁcation. The resulting linear reduced order model is combined with a classical Kalman Filter for state reconstruction with minimum error covariance as well as a reduced order observer with very low computational and memory demands. The performance of the two schemes is evaluated and compared in terms of the approximated L 2 error norm in a numerical simulation study. It turns out, that for the evaluated case study the reduced-order scheme achieves comparable performance with signiﬁcantly less computational load.


Introduction
The problem of system state reconstruction from local and partial state measurements has important impacts on the ability of controlling the system state and monitoring system health and performance [1][2][3]. This becomes particularly involved in the framework of spatially continuously distributed dynamical systems, which are typically modeled using partial differential equations (pdes). The so called associated state estimation or observer design problem has been studied over the last decades and already reached a state of maturity in particular for linear systems (see, e.g., [4,5]). Additionally, for some classes of nonlinear or semilinear systems, design approaches have been obtained, which differ in design complexity and computational load in their implementation. A particular approach, which is simple and can be implemented with small computational load is the pointwise measurement injection (or pointwise innovation) observer design, which has been employed for solving the problem for some types of semilinear parabolic problems [6][7][8][9][10]. This approach basically extends the idea of a reduced order observer design [1,2] to the considered class of infinite-dimensional systems. The reduced order property is to be understood in relation to the number of internal states of the observer in comparison to the number of states in the system to be monitored. Basically the idea consists in providing a non-redundant state estimation scheme, which does not need to estimate the measured output, but only non-measured state variables.
Considering pde models leads to the additional problem that for implementing any observer in a real-time monitoring scheme, the associated set of pdes has to be solved numerically. This is done using some kind of discretization in space and time, so that a finite-dimensional discrete-time system representation is obtained, which represents a digital twin of the original pde system. The number of internal state variables of this approximation model is typically very high (from about hundred to several thousands of variables, e.g., for finite-element approximations) and thus requires model-order reduction approaches for real-time applicability. For this purpose, typically balanced truncation or proper orthogonal decomposition approaches or similar are employed (see, e.g., [11] and references therein). Besides these, in recent years a focus was led to approaches using dynamic mode decomposition (DMD) [11,12] and its extension to DMDc to include external (control) inputs [13]. The usage of DMD (or DMDc) is particularly motivated by the Koopman theory, according to which nonlinear finite-dimensional systems can be embedded into an infinite-dimensional space on which the linear Kooppman operator provides a complete description of the time evolution of the system observables. The (Koopman) eigenfunctions or the associated Koopman operator form a basis of this space and can be used in principle to determine a finite-dimensional approximation of the dominant modes of this operator (associated to the dominant eigenvalues of the operator). As this embedding can in most cases not be carried out analytically (see, e.g., the discussion in [3,11,12]) one can employ DMD (or DMDc) as a data-based identification approach using snapshots of the state observables as training data. This idea has been combined with the use of autoencoders (AEs) to learn from the training data a low-dimensional basis of the associated dominant Koopman eigenfunction subspace in which a linear dynamics can then be identified using classical DMDc [14]. The use of autoencoders and Koopman theory for automatic learning of finite-dimensional system dynamics is also mainstream in contemporary control systems applications in general (see, e.g., [15,16]).
Having finite-dimensional approximations of the infinite-dimensional pde models, one can also directly address the observer design problem in the finite-dimensional setup. This approach is also called early lumping (first approximate and then design), while the results discussed before belong to the so-called late lumping (first design then approximate) approaches.
Early lumping based on the combination of DMDc with state estimation has already been shown in several application scenarios to yield satisfactory performance [3,[17][18][19][20]. In these studies the Kalman Filter [21,22] has been used on the basis of the obtained finite-dimensional linear discrete-time model equations. The Kalman Filter is known to provide satisfactory state estimation results, as it ensures a minimum estimation error covariance taking into account a priori known knowledge on the perturbation statistics, with the perturbations on the states and measurement considered as white noise. One potential drawback for implementations of the Kalman Filter in real-time applications relies on the fact that it requires n 2 + n dynamic variables, where n is the number of states in the reduced-order model, given that the n × n error covariance matrix must be calculated online to provide the optimal filter performance. In comparison to this approach, alternative design methods, such as the reduced-order [1,2] or the geometric observer [23] do actually require less than n dynamic variables.
Motivated by the above considerations, in the present study the autoencoder-based approximation of finite-dimensional low order basis representations is employed in combination with DMDc to identify a reduced-order model for a linear parabolic diffusionconvection-reaction system for which a reduced order observer and a Kalman Filter are implemented and compared. The training data is generated on the basis of a high-fidelity numerical approximation of the solutions of the pde model and the comparison of the approaches carried out on the basis of solutions obtained for different input signals. Given the early lumping nature of this appproach, it somehow represents a complementary counterpart of the late-lumping reduced-order design studies in [6][7][8][9][10]. To the author's knowledge, this is the first time such a comparison with reduced order observers is carried out, in particular for this class of systems. The result is an inductive step towards the consideration of distributed parameter models involving transport-reaction mechanisms as typically found in chemical engineering applications [7,[24][25][26][27][28][29].
The paper is organized as follows. In Section 2 the problem statement is provided. The model-order reduction using autoencoder-based DMDc (ae-dmdc) is discussed in Section 3. In Section 4 the state estimation problem is addressed using the reduced order observer (roo) and the Kalman Filter (kf). The obtained results are discussed in Section 5. Conclusions and an outlook are provided in Section 6.

Problem Statement
Consider the following diffusion-convection-reaction system with Danckwert's boundary conditions In the sequel, the developments are illustrated and tested for a case study with with a reaction rate maximum at s = 0.2, and measurement locations motivated by related studies on approximate observability of the pde model [29] suggesting s m = 1 and recent studies on the pointwise measurement injection observer [6][7][8][9][10] suggesting an in-domain measurement (here chosen as s m = 0.5). Giving the transport-reaction character of the considered system model, the associated system specific time scales can best be captured using the Péclet (P e ) and Damköhler (D a ) numbers [24,28] with P e giving the relation between convective and diffusive characteristic times and D a the relation between reaction and convection characteristic times. Given a moderate pair (P e , D a ) the solution behavior is expected to be relatively smooth without steep spatial gradients, so that different numerical solution schemes can be implemented. The pde model is solved numerically using a finite-difference approximation with N = 100 collocation points, implying a spatial step of dz = 0.01 yielding a finite-dimensional approximatioṅ with state vector x(t) ∈ R N at time t ≥ 0, and F ∈ R N×N , g, h ∈ R N . For this purpose, central differences (O 2 (dz)) are used for the advection and the diffusion parts. The finitedimensional ode model (4) is subsequently solved using the standard Euler forward solver (O 2 (dt)) in MATLAB . Note that for solving the ode system (4) in MATLAB a discrete-time system approximation of the form with Φ ∈ R N×N and γ ∈ R N is used. In summary, the approximation error is O 2 (dt, dz). To achieve an accurate simulation result, the step size in time was set to dt = 2 × 10 −5 . The solution of the resulting equations for the input and initial value given by with σ denoting the Heaviside function, and initial condition x 0 (s) = 0.2 sin(π/2s) 2 is shown in Figure 1.

Model-Oder Reduction
For the purpose of reducing the computational load and having a low-dimensional model for observer design, which can also be implemented, e.g., on low-cost devices for system monitoring, or which is fast enough for time-critical real-time applications, in the following a machine-learning based model-order reduction is employed, which is finally motivated by a low-order approximation of the Koopman operator for the discrete-time system (5) (for details, see, e.g., [14] and also [3,12,19]).

Machine Learning for Koopman Basis Identification
Recall that x ∈ R N and consider the simple three-layer autoencoder network, as shown in Figure 2 for which it holds that The use of a single hidden layer is basically motivated by the fact that using model order reduction techniques based on simple matrix-vector multiplications such as balanced truncation or proper orthogonal decomposition (see, e.g., [11]) it is known that a satisfactory approximation accuracy can be obtained.
The weight matrices W 1 , W 2 and bias vectors b 1 , b 2 are determined using a standard backpropagation algorithm so that The result obtained using the trainAutoencoder function in MATLAB with N = 100 inputs, one hidden layer with q = 40 nodes (later associated to a model approximation using q internal states), 5000 snapshots, maximum epochs set to 15,000 and choosing the activation function for the input layer as logsig and of the output layer as purelin can be seen in Figure 3, showing on the left side the comparison between the reference solution (see also Figure 1) and the corresponding output (corresponding tox) of the autoencoder. The associated approximated L 2 norm of the approximation error x − x is shown on the right side. It can be seen that a rather good correspondence is obtained.

Reduced Model Based DMDc
To identify the dynamics in terms of the low-dimensional state representation ξ ∈ R q (q N) by means of DMDc [13], consider the snapshot and input matrices Ξ = ξ(0) ξ(1) · · · ξ(m) ∈ R q×m , Ξ + = ξ(1) · · · ξ(m + 1) ∈ R p×m , with their respective singular value decompositions Considering that one obtains Taking into account only the dominant singular values σ 2,i , i = 1, . . . , r 2 and σ 1,j , j = 1, . . . , r 1 this becomes Introducing the reduced order state and noticing that U T 2,r U 2,r = I q it holds that Thus the low-dimensional dynamic model is obtained in the form The resulting network-based model order reduction scheme is shown in Figure 4.
x N−1 (t + 1) x 2 (t + 1) The obtained results with r 2 = 17, r 1 = 18 are shown in Figure 5. On the left of the figure, the comparison of the reference solution and the autoencoder-DMDc (aedmdc) based approximation can be seen. On the right part of the figure, the associated approximation error norm is shown.  Next, the identification of a suitable approximation of the output (i.e., measurement) operator is addressed. In terms of the numerical approximation (5) of the pde model (1), the output at time t ∈ R is given by For the purpose of providing an optimal minimum squared error approximation ofc T the following snapshot matrices have been introduced Y = y(0) · · · y(m) , Z = z(0) · · · z(m) = U T 2,r Ξ so that with the pseudo-inverse Z † one obtains The resulting absolute error of the output approximation is shown in Figure 6 for the two different sensor locations s m = 1 and s m = 0.5. It can be seen that a good correspondence is achieved for both sensor locations.

Full State Reconstruction Using the Ae-Dmdc Reduced-Order Model
In this section, the reduced-order model with state z(t) ∈ R r 2 (with r 2 N) is employed to design algorithms for the reconstruction of the complete state x(t) ∈ R N by combining the available measurement information, the known input u(t) with the model using two different observer design approaches and the decoder of the previously designed autoencoder. In general, consider an estimateẑ with the associated full-state estimatex = η 2 (W 2ẑ + b 2 ) it holds for the estimation error e =x − x that so that for a linear decoder mapping η 2 one obtains implying that e → 0 if and only ifẑ − z → 0. Thus, providing an estimate in the zcoordinates is sufficient for obtaining a reliable estimate in the original coordinates, as long as the autoencoder is adequately trained (i.e., in particular with η 2 (0 + b 2 ) ≈ 0).

Reduced-Order Observer
The main idea of the reduced order observer is to prevent redundant estimates and reduce model-dependency of the state estimation scheme [1,2]. For this purpose, the measurements (or parts of them) are directly included unfiltered in the estimator, which is only possible if no significant sensor noise is at place. To introduce the approach, formally consider the state transformation with T such that V is invertible, i.e., rank(T) = r 2 − 1 (as there is only 1 measurement) and Tc = 0, meaning that the rows of T corresponds to basis vectors of the orthogonal complement of the space spanned by the row vector c T (i.e., the measurement subspace). It should be noted that there exist effective numerical algorithms for determining the matrix T, e.g., using null in MATLAB . Next, consider the partition and the associated reduced order observer It is straight forward to verify that the associated observation errorz =ẑ − z exponentially converges to zero with time if and only if all eigenvalues of the matrix A r are contained in the open unit disk U 1 ⊂ C. If some eigenvalues are contained on the unit circle there are estimation error components (in ζ coordinates!) that will remain constant but not grow in time, i.e., for small initial errors, small error evolutions can still be achieved. Given that z = 0 is mapped onto x = 0 the convergence in the original state coordinates can be ensured within some accuracy limitation associated to the approximation performance of the autoencoder.

Kalman Filter
For comparison purposes, the state estimation by using the Kalman Filter [21,22] is considered. The combination of DMDc models with Kalman Filters has recently been shown to yield quite convincing results [3,[17][18][19][20]. One big potential of the Kalman Filter relies on the fact that it provides a minimum estimation error covariance based on a priori statistics of the model and measurement uncertainties, which are considered as white noise processes with covariances Q ∈ R r 2 ×r 2 and r ∈ R in the present context, i.e., Under these assumptions, the minimum estimation error covariance state estimator is given by the following prediction-correction (innovation) scheme (see, e.g., [21]) Kalman gain Innovation z(t + 1) = (I − L(t + 1)c T )ẑ p (t + 1) + Ly(t + 1) (8d) Complete state reconstruction with predicted stateẑ p , error covariance P p , Kalman (correction) gain L, state estimateẑ and error covariance P.
Note that the implementation of the Kalman Filter implies the memory usage of a total of r 2 2 + r 2 floating point variables for storing (and for updating) the error covariance matrix P(t) ∈ R r 2 ×r 2 and the state estimateẑ(t) ∈ R r 2 . For the case study with r 2 = 17 this yields 306 dynamic variables, while the original pde approximation is based on only 100. On the other hand it should be noticed that the Kalman Filter provides a minimum estimation error covariance, which is in particular useful if one takes into account the stochastic process and measurement noise perturbations. Given the approximative nature of the model (in full-scale with N = 100 state variables as well as in reduced scale with r 2 = 17 state variables, this uncertainty rejection property can eventually compensate modeling errors up to some point.

Observer Evaluation
In this section both observer design approaches are compared for two different input scenarios giving rise to four different scenarios (S1) u = u 1 and s m = 1, (S2) u = u 1 and s m = 0.5 (S3) u = u 2 and s m = 1, (S4) u = u 2 and s m = 0.5. The associated input and state profiles are shown in Figure 7. For the comparison of the appproaches, the associated approximated evolution of the L 2 norm of the observation error profiles is evaluated. Further, to evaluate the associated computational load, the mean computation time per time stept and its variance ν are evaluated for both approaches on a notebook computer with Intel Pentium CPU 4415U with 2.3 GHz.
The results of the evaluation are shown in Figure 8. It can be seen that both observer schemes yield similar estimation performance, with the Kalman Filter performing slightly better in terms of precision for s m = 1 and the reduced order observer for s m = 0.5. The associated computation times clearly show that the reduced order observer achieves its core calculations about four times as fast as the Kalman Filter, which is due to the higher dimensionality of the Kalman Filter with the required calculation of the covariance matrix.  . Comparison of the approximated L 2 observation error norm evolutions for the four different scenarios (S1) to (S4) with the respective mean computation times per unit step (t roo ,t k f and the associated variances ν roo , ν k f obtained in the simulation study.

Discussion
As commented above, the implementation of the reduced order observer actually requires only r 2 − 1 (i.e., for the case study with r 2 = 17 this corresponds to only 16) variables, while the Kalman Filter requires the additional information about the error covariance matrix at each time step, yielding a total of r 2 2 + r 2 variables (i.e., for the case study 306). This implies a significant reduction (by a factor about 19 for the case study) in comparison to the Kalman Filter in terms of memory storage. In addition, as discussed in the previous section, in the performed simulation studies the core calculations of the reduced-order observer were about four times faster than the ones for the Kalman Filter. It should be mentioned anyway, that in the implementation in MATLAB , when taking into account the time for the decoding, i.e., the last step in both observer schemes, this time advantage is basically lost and both approaches perform similarly.
As it can be seen, the performance of the Kalman Filter and the reduced order observer are quite comparable for this particular case study. As commented in the introductory discussion for the reduced order observer, this is clearly to be attributed to an output approximation with low noise level. For systems with more sensor noise it is not recommendable to use the unfiltered output directly in the observer scheme.
As a consequence of these results, as long as computation times are below system specific constraints in terms of time scales, both approaches can be employed, while in the case that the computation times should be held as low as possible and that the measurement noise is neglectable, the reduced order observer has clear advantages. This holds true in particular if in addition to the observer scheme some complex calculations or real-time parallel optimizations are performed.

Conclusions and Outlook
The combination of autoencoders and dynamic mode decomposition has been employed as a basis for low-dimensional observer design for a class of diffusion-convectionreaction systems using the reduced order observer and Kalman Filter approaches. Using a case study it was shown that both approaches yield satisfactory and comparable results, with the Kalman Filter having a slightly better precision on the cost of significantly higher memory costs, while the reduced order observer provides a comparable performance with very low computational demand and a high degree of simplicity in the design. Future studies will focus on different, more complex applications, including networks of systems, semilinear pde systems, pdes on higher-dimensional domains (2D, 3D), spatially varying diffusion and convection mechanisms, and study of the impact of the autoencoder structure with more detail (number and size of hidden layers, activation and loss functions, etc.). Comparisons with other model order reduction approaches will be carried out for the respective cases.

Conflicts of Interest:
The authors declare no conflict of interest.
Considering in a first approximation that the temperature deviation can be neglected (e.g., due to some appropriate control through the cooling jacket temperature [27]) one ob-tains for the concentration dynamics in a neighborhood of the equilibrium profile c * the following dynamics ∂ t x = D∂ 2 s x − vx − rx with r(s) = ρ (c * (s))e −γ/T * (s) for s ∈ (0, 1) and the associated boundary conditions. This model is equivalent to the one in (1), assuming a particular reaction rate distribution r with reaction rate maximum determined by the coefficients k 1 .