A Regularized Real-Time Integrator for Data-Driven Control of Heating Channels

: In many contexts of scientiﬁc computing and engineering science, phenomena are moni-tored over time and data are collected as time-series. Plenty of algorithms have been proposed in the ﬁeld of time-series data mining, many of them based on deep learning techniques. High-ﬁdelity simulations of complex scenarios are truly computationally expensive and a real-time monitoring and control could be efﬁciently achieved by the use of artiﬁcial intelligence. In this work we build accurate data-driven models of a two-phase transient ﬂow in a heated channel, as usually encountered in heat exchangers. The proposed methods combine several artiﬁcial neural networks architectures, involving standard and transposed deep convolutions. In particular, a very accurate real-time integrator of the system has been developed.


Introduction
Most phenomena in engineering are described by sequential data. In the context of data-driven simulation and control [1], time series classification and forecasting play therefore a fundamental role. A huge variety of applications, such as financial mathematics, language and speech recognition, weather forecasting, physical and chemical unsteady systems, are extensively discussed in the literature [2][3][4][5].
In the last decade, data-driven approximations of the Koopman operator have successfully been applied to build reduced-models of complex dynamical systems, allowing their identification and control [6][7][8][9]. Several approaches are based on combining algorithms derived from the dynamic mode decomposition-DMD-with deep learning-DL-techniques, as discussed in [10,11] and references therein.
In this work, several surrogates based on standard and transposed convolution neural networks-CNNs- [25][26][27] are applied to a thermohydraulic problem in the context of heat exchangers involving change of phase [28][29][30]. The problem will be described in detail later. It basically consists in fixing the input heat power u(t) injected into a two-phase flow and monitoring some quantities y(t) at the outlet of a pipeline in which the gas-liquid mixture is confined and flows.
Mathematically speaking, let us consider an input and output time series, respectively u and y, tracking N t time instants. The problem is the estimation of the non-linear transfer function h such that y = h(u). Learning the function h allows the real-time integration of a newly defined input to know the impact on some outputs of interest.
A first approach is based on exploiting a parametrization of the input function u(t), through a few values {U 1 , U 2 , . . . , U K }. In this case, the modeling framework is nonincremental, since the whole output time series is computed from the inputs {U 1 , U 2 , . . . , U K }, that is: The estimation of h will be done through the use of artificial neural networks. However, since the output consists of N t evaluations, while the input of K < N t parameters, a transposed convolution (also known as fractionally-strided convolution) is employed to reach the right dimension at the output layer. As a matter of fact, transposed convolutions increase (up sample) the spatial dimensions of intermediate feature maps, reversing down sample operations obtained by classical convolutions [25].
A second approach consists of considering an incremental modelling framework, in which the model (also known as integrator) computes the next value of the output sequence from a given number of previous inputs and outputs. Such approach has some similarities with the NARXNETs [18]. Denoting with y i = y(t i ), for i = 1, . . . , N t , we can write: where the parameter k ≥ 0 is a delay term, known as the process dead time. We can also note U = y past ∪ u, y past being the known y outputs used to predict future values of y. A neural-network architecture based on a convolution approach can then be defined. Two non-incremental models (for different values of the delay coefficient k) will be compared, in terms of speed and accuracy. The paper aims to build a data-driven model that is capable of reproducing the data with high fidelity, while possibly performing the integrator task over a large time frame. First of all, the model of the addressed problem and the simulation data structure are described in Section 2. Section 3 presents three different surrogates: • A non-incremental modelling framework is presented in Section 3.1, which predicts the whole time frame in a single shot • Two incremental alternatives are presented in Section 3.2. These alternatives used different memory size to predict the future variation in the quantities of interest. • The incremental surrogate model with large memory is then used as an integrator to predict the variation over a large time series Finally, Section 4 addresses the conclusion summarizing the results and highlighting the main differences among the learned models.

Problem Statement
High-fidelity numerical simulations of a thermohydraulic problem are performed using the software CATHARE for nuclear safety analysis [31][32][33]. The simulation consists into a two-phase (steam-water) flow moving upward against gravity into a channel heated by hot walls, as sketched in Figure 1. The model input is the time-varying thermal power q(t) injected into the fluid through the walls, while the outputs are the time histories of several fluid properties at the channel outlet. The simulation covers the interval [0, 100] s, with a time step ∆t = 0.1 s.

Figure 1.
Fluid channel with wall surface heating, controlled by the heat power density law q(t). The green point represents the location of the measurement node at the outlet.
Using the bold notation for vectors (sequences of 1000 elements corresponding to the time discretization), the quantities of interest are the steam fraction in the flowing system A, the liquid velocity V, the steam temperature T G and the liquid temperature T L . This work aims to produce a surrogate model h for inferring the outputs as a function of the input variable q: The surrogate model is built in a non-intrusive manner, defining a Design of Experiments-DoE-on the power density and combining the outputs of the corresponding offline simulations.
A user-defined power control law allows to assign the value of the provided power density at specific time instants. Therefore, the input function q(t) is defined through a few values Q i = q(t i ), normalized by the maximum power density Q max = 8.4 × 10 8 W/m 3 , and then linearly interpolating in-between. In particular, the time instants (in seconds) {0, 1, 25, 50, 75, 100} are considered for the power changes. The corresponding values Q i , for i = 0, . . . , 5, define a parametrization of the input function q(t).
A DoE of 23 data points is established using different combinations of the values Q i , as listed in Table 1. For the rest of this work, the last 4 datasets are considered in the testing set and therefore not fed into the training loop of the built surrogate model. Note that some of the values are fixed (Q 0 = 0, Q 1 = Q 5 = 1) thus the model input parameters are only Q 2 , Q 3 and Q 4 . Figure 2 shows an example of the heat power density q(t) defined from values Q i . Table 1. Available plan of experiment (DoE) and sequences of control points values, as a fraction of Q max = 8.4 × 10 8 W/m 3 .  A sample set of results, of the available datasets number 1, 2, 3 and 4 of Table 1, is shown in Figure 3 for illustration purposes.  Table 1.

Surrogate Modeling of the Data
Several models are tested and in what follows we explain the results of three of them. Before any training, the variables are normalized by subtracting their corresponding means and dividing by the variation amplitude • max − • min .

A Non-Incremental Approach
In this section, we present a non-incremental modelling framework, aiming to reproduce all the measured 1000 steps time sequence while considering only the three variables Q 2 , Q 3 and Q 4 . Therefore, the first model H 1 aims to create an output of the form: with• being the approximation of the data, output of the surrogate model. In this section, since we are using a non-incremental approach, each dataset is nothing but a single data-point, for which we evaluate a single output. We use a neural network based on a transposed convolution approach [25][26][27]. The network starts with an input of 3 values, for instance (Q 2 , Q 3 , Q 4 ), and propagate the variation of the input variables into the required output. The architecture of the used network is illustrated in Table 2 and schematically represented through the diagram in Figure 4. The first layer in the designed neural network transforms the input into a dimension suitable for beginning a convolution transpose. The second layer reshapes the inputs into a simulated 2D picture of dimension 8 × 4, with 240 filters. The subsequent layers reduce the number of filtering while increasing the size of the first dimension of the tensor, through multiplying by 5 on each layer, reaching therefore a dimension of 1000 × 4 on layer 6, the exact dimension of the required 4 variables output. tanh stands for the hyperbolic tangent activation function, selu stands for the scaled exponential linear unit [34], while linear stands for linear activation or no activation function. CT stands for "convolution transpose".
Note the representation scale in panel (c).
The relative errors generated by the non-incremental approach are illustrated in Table 3. Since the output A contains several zero values, the relative error E A is normalized with respect to the maximum value of A: with A being the measured data andÂ the surrogate model output. The normalization is standard for any other variable v: where, similarly, v denotes the measured data,v the surrogate model output and the symbol the Hadamard division operator. The results are excellent in either training or testing sets.

Incremental Approaches
Using an incremental approach consists in learning the integrator that allows to update the state of the system at each time step. The model is built by first decomposing the datasets into a sequence chain of length t s . Increasing the chain size t s , by considering therefore a higher number of time steps in the variation history, will definitely improve the results, by jeopardizing the computation time required for the prediction of the data point. Two incremental models are tested, the first one considers t s = 4, and is an excellent predictor of the next output of the considered sequences but fails as dynamical integrator as will be discussed later, whereas the second model uses t s = 20, with a regularization, for improving stability, that performs correctly as integrator.

A Fast Predictor
In this section we explain the possibility of using a fast algorithm based on convolution layers, to predict the next outcome of the sequence at hand. The model will take as an input the variables (A, V, T G , T L ) at time steps i − 1, i − 2, i − 3 and i − 4, as well as the input q at time steps i, i − 1, i − 2 and i − 3, to predict the quantities of interest at time step i. For instance, we can write the second trained surrogate model H 2 as follows: The considered network to approximate the surrogate model H 2 is summarized in Table 4 and schematized in Figure 8. The input of the network is a 3D tensor of dimension (4 × 5 × 1), for 4 time steps and 5 variables. The selected optimization algorithm is Adam with the loss function being the mean square errors. The available data after reformulation into a sequence of 4 steps has the size of 19 × 996 inputs for the training sets and 4 × 996 for the testing set. A batch size of 996 is selected with 1500 epochs for the training. The initial learning rate was set to α = 10 −4 . Table 4. Structure of the deep convolution neural network used for the fitting of H 2 . tanh stands for the hyperbolic tangent activation function, selu stands for the scaled exponential linear unit [34], while linear stands for linear activation or no activation function.
Model H 2 shows an excellent ability to reproduce the datasets when the input is always taken from the physical dataset values and not from the output of the predictor H 2 itself. The errors increase dramatically when the input of H 2 is taken from its previous output, yielding an impossibility to use it as an integrator of the dynamical system.
On the other hand, the performance of H 2 is comparable to the performance of H 1 , when comparing the relative errors of both models. From a comparison of the resulting graphical illustrations, H 2 seems to perform better on the testing sets.  Table 1. Table 5. Mean relative errors of the trained incremental model H 2 with t s = 4. All convolutions employ "same" padding, meaning that the input is half padded and that the filter is applied to all the elements of the input.

A Larger Sequence Time Integrator
In this section we consider again the rationale followed in Section 3.2.1, with an algorithm based on convolution layers, to predict the next outcome of the sequence at hand. The model will take as an input the variables available at the last 20 time steps. According to the previously introduced notation, t s = 20. The built model H 3 will predict the quantities of interest at the subsequent time step. For instance, we can write the trained surrogate model H 3 as shown below: The surrogate model H 3 is built using the network illustrated in Table 6 and represented through the diagram in Figure 12. The input of the network is a 3D tensor of dimension (20 × 5 × 1), for 20 time steps and 5 variables as inputs. The selected optimization algorithm is Adam with the loss function being the mean square errors, modified to add a regularization similar to the DMD regularization [36,37], to perform integration in time without increasing the error significantly. For instance, the loss function J in this section reads: where θ j are all the weights of the last layer of the network described in Table 6 (the linear layer), and κ j its biases. λ w and λ b are regularization parameters. The available data after reformulation into a sequence of 20 steps has the size of 19 × 980 inputs for the training sets and 4 × 980 for the testing set. A batch size of 980 is selected with 1500 epochs for the training. The initial learning rate was set to α = 10 −4 . The input shape has therefore of a size of 20 × 5 × 1. The results are shown for a λ w = λ b = 10 −6 . Table 6. Structure of the deep convolution neural network used for the fitting of H 3 . tanh stands for the hyperbolic tangent activation function, selu stands for the scaled exponential linear unit [34], while linear stands for linear activation or no activation function.

Integrator Results
The neural network described in Table 6, complemented with the regularized loss given in Equation (9), can be used to integrate in time the system behavior. For instance, we can write The integrator is stable enough to compile all the time steps. Figure 16 shows the integrator results in a training case, whereas Figures 17 and 18 show the integrator results on two testing cases. The integrator errors are given in Table 8.   Table 1. Figure 17. Sample integrator results of H 3 surrogate model when integrating the testing datasets number 20, as shown in Table 1.  (10). It can be clearly noticed that the integrator performs extremely well on the selected datasets.  Table 1.

Conclusions
In this work several prediction models for output time series from input time series have been compared. Without loss of generality, a coupled problem in thermohydraulics has been considered, but the procedure generalizes to many other phenomena and fields. Moreover, instead of simulation-based data, experimental data can be used.
The models are based on different deep neural networks architectures. If a parametrization of the input function is available, a non-incremental approach involving transposed convolution neural networks allows the prediction of the whole sequence, as described and discussed in Section 3.1. The non-incremental approach is based on convolution transpose layers, which takes a low amount of data as an input and increase the size of the output to reach a desired dimension. These layers appears as a suitable approach in this situation where only few inputs are required to built the whole sequence of time measurements.
The incremental models (dynamical system integrators) presented in Section 3.2 are based on regressing the variable on its own lagged (i.e., past) values together with the ones of the input, through standard deep convolution neural networks. Therefore, the prediction occurs step-by-step. The difference between the two incremental models stands in the delay (or lag) factor t s . A small value of t s allows really fast predictions. However, the model can be unstable when its input is taken from the previous model outputs (this occurs, for instance, when t s = 4). In such case, the surrogate cannot be used as an integrator of the system. Results are thoroughly improved for t s = 20, where also a regularization term was introduced in the loss function. The incremental approach is based on convolution layers. In fact, these layers will take a large input and reduce the input size to extract relevant information. The filters are designed with a large dimension along the time axis, to allow easier extraction of the inputs' time variation.
The non-incremental approach can be clearly faster to evaluate the output of the solution, however, the integrator clearly outperforms the non-incremental approach. Moreover, the integrator can be used on novel datasets, and potentially outside of the simulated time frame. The initialization requires only t s time-steps to be simulated, and the remaining dynamics is predicted in real time, allowing huge computational gains. However, the non-incremental approach can't be used on a different time frame, out of the simulated region, as the output is hardly coded to predict only N t = 1000 time steps.
Funding: This research received no external funding.
Data Availability Statement: Data is available upon request.