Digital Twin for Experimental Data Fusion Applied to a Semi-Industrial Furnace Fed with H 2 -Rich Fuel Mixtures

: The objective of this work is to build a Digital Twin of a semi-industrial furnace using Gaussian Process Regression coupled with dimensionality reduction via Proper Orthogonal De-composition. The Digital Twin is capable of integrating different sources of information, such as temperature, chemiluminescence intensity and species concentration at the outlet. The parameters selected to build the design space are the equivalence ratio and the benzene concentration in the fuel stream. The fuel consists of a H 2 /CH 4 /CO blend, doped with a progressive addition of C 6 H 6 . It is an H 2 -rich fuel mixture, representing a surrogate of a more complex Coke Oven Gas industrial mixture. The experimental measurements include the ﬂame temperature distribution, measured on a 6 × 8 grid using an air-cooled suction pyrometer, spatially resolved chemiluminescence measurements of OH ∗ and CH ∗ , and the species concentration (i.e., NO, NO 2 , CO, H 2 O, CO 2 , O 2 ) measured in the exhaust gases. The GPR-based Digital Twin approach has already been successfully applied on numerical datasets coming from CFD simulations. In this work, we demonstrate that the same approach can be applied on heterogeneous datasets, obtained from experimental measurements.


Introduction
The need for the rapid decarbonization of the global economy requires the development of new combustion systems that are both efficient and flexible, to allow the use of new zero-carbon fuels such as hydrogen and ammonia [1].
These requirements, coupled with the necessity of limiting the production of harmful pollutants, impose strict operating conditions on the combustion systems.This means that the design and control of these systems is crucial, and the margin of error is limited.
For this reason, the model of trial-and-error employed for the design of a traditional combustion system is both too time-consuming and error prone to be applied on the development of new combustion designs.
Luckily, the recent developments in machine-learning techniques and the increasing availability of data offer various tools that can be exploited in the design and operation of combustors.
In particular, the development of Digital Twins (DTs) has been increasingly regarded as a way to substantially improve both the knowledge of industrial systems and their control [2][3][4].The DT is defined as a digital representation of a physical object that can closely simulate its behavior in the real environment [2,4].The DT can help in the design, production and service phases of a product.In the design phase, the DT is useful in the iterative optimization and the virtual evaluation, while in the production phase it can be employed for real-time monitoring, control and prediction.Finally, in the service phase, the DT can be used to forecast the product's maintenance [2,3].
Crucially, the DT has to provide a real-time prediction of the state of its physical counterpart, so that it can supply constant reference values.
This requirement generally excludes Computational Fluid Dynamics (CFD) simulations as techniques to rely on building DTs because the computing time is not negligible, even for relatively simple geometries.
Instead, data-driven regression models, such as linear regression, Gaussian Process Regression (GPR) and Neural Networks (NNs), are particularly suited for this kind of application because the training phase is distinct from the prediction phase.This means that the training phase, however long, can be performed offline while the trained model can predict the system's state almost instantaneously.
Among these regression methods, the GPR offers a number of advantages with respect to the other regression models.In particular, it is a nonlinear technique based on Bayesian inference with a limited amount of hyperparameters to tune.This probabilistic framework means that the predictions are complemented with the model's uncertainty [5].
Moreover, the GPR performs well with sparse datasets.The availability of sparse datasets is often the case when dealing with reactive systems because the amount of operating conditions explored is limited by the computational cost (for CFD simulations) or the operating cost (for experimental campaigns).
Other important tools to build DTs are dimensionality reduction techniques, in particular the Proper Orthogonal Decomposition (POD) [6].The POD is a linear technique capable of finding a low-dimensional representation of the system without losing important information.This allows for greatly reducing the complexity of the model while maintaining high accuracy in the prediction.
Recently, the GPR and POD approach has been employed to develop a numerical DT of a semi-industrial furnace [7], while the POD coupled with sparse sensing was used to develop a hybrid numerical-experimental DT of the same furnace [8].
However, in some situations, it is not possible to rely on the availability of numerical data to build a DT.For example, numerical simulations of highly turbulent reacting flows still do not provide accurate predictions of pollutants and stability limits [9].Moreover, the use of state-of-the-art Large Eddy Simulation combustion models is often too expensive to be applied to industrial facilities [10].
In these cases, an experimental campaign is often carried out with different measuring instruments to characterize the combustion system.This usually produces various datasets with vastly different spatial scales (the pressure field obtained using pressure probes has a much lower spatial resolution than a velocity field measured using particle image velocimetry, for example).
The objective of this work is to develop a DT of the ULB semi-industrial furnace which relies solely on experimental data.The DT is capable of integrating different data streams and it provides in real time the prediction of temperature, chemiluminescence and species concentration, for different degrees of equivalence ratio and benzene doping in the fuel blend.To achieve this, the experimental data are first projected into the low-dimensional space using POD.Then, the regression is performed using GPR in the low-dimensional manifold.Finally, the prediction can be obtained by projecting the solution found by the GPR model in the original space.
The experimental setup is outlined in Section 2, while the methodology for the development of the DT is explained in more detail in Section 3. The results are shown in Section 4 and discussed in Section 5.

Experimental Setup
The experimental data have been collected in a semi-industrial furnace, fed with H 2 /CH 4 /CO fuel mixtures doped with C 6 H 6 .Detailed information about the experimental setup and measurement techniques are provided in previous publications [11,12], while the dataset used in this work has been published in [13].
However, some key features of the experimental set-up are also summarized below, along with the specific operating conditions used for this dataset.
The combustion furnace used in this work consists of an insulated combustion chamber.Figure 1 shows the schematic diagram of the vertical cross-section of the furnace-burner assembly, while Figure 2 shows a schematic representation of the test bench.The furnace has a cubic shape (1100 × 1100 × 1100 mm), and it is internally insulated with a 200 mm thick high-temperature ( 1300 °C) ceramic foam layer, which limits the heat lost through the walls.Therefore, the resulting inner dimensions of the combustion chamber are: 700 × 700 × 700 mm.
The furnace is equipped with an air cooling system, which consists of four cooling tubes of 80 mm of outer diameter and a length of 630 mm.By varying the cooling air flow rate, different stable conditions are established inside the combustion chamber, and the effect of variable industrial loads can be tested.
On each vertical wall of the combustion chamber, there is a slot (150 × 600 mm) available for measurements.The details of the measurements through different openings can be found in [11,12].
At the bottom of the combustion chamber, a commercial WS® REKUMAT M150 recuperative Flame-FLOX burner with a nominal power of 20 kW is mounted.Two different air/fuel injection configurations are possible when the burner is operated in Flame or in FLOX® combustion mode [11].
The dimension of the air injection nozzle ID can be varied (ID: 16, 20, 25 mm).However, the experimental data reported in [13] have been measured while operating the burner only in Flame mode and with using a fixed air injection nozzle (ID:25 mm).The semi-industrial facility is equipped with a fuel-feeding system, integrated with an evaporation system and a mixing unit that allow for creating synthetic blends.The evaporation system was used to vaporize benzene in order to homogeneously mix it with the other gases in the static mixing unit.
Brooks SLA58XX® Series mass flow controllers are used to control and monitor the flow of gases, and Brooks Quantim® Series Coriolis mass flow controllers of ranges, 320 g/h and 1500 g/h are used for the benzene.
An air-cooled suction pyrometer probe equipped with a B-type (Pt-Rh 6% -Pt-Rh 30%) thermocouple is used to perform in situ flame temperature measurements.
OH * and CH * chemiluminescence imaging was carried out by means of an Intensified Relay Optics (IRO) and a Charge-Coupled Device (CCD) camera 1.4 M (La Vision 1392 × 1040 pixels) coupled with a UV 78mm f/3.8 lens and two interferential filters to collect the chemiluminescence emitted by OH * (310 ± 10 nm) and CH * (438 ± 24 nm).The CCD camera has a maximum frame rate at a full resolution equal to 17 fps.
The flue gas composition was detected by using a Fourier Transform Infrared Spectroscopy (FTIR) analyzer from HORIBA® (HORIBA MEXA-ONE), equipped with a paramagnetic analyzer for oxygen measurement.
Figure 3 shows the position of the experimental probes inside the furnace.The position of the temperature and the chemiluminescence sensors was chosen to accurately map the reactive region and by considering the design constraints of the set-up.The species concentrations are measured at the outlet of the furnace to check the emissions of pollutants and main combustion products.
The experimental uncertainties for the temperature and the emissions are reported in Table 1.

Operating Conditions
A mixture of H 2 /CH 4 /CO has been used as fuel.It is a H 2 -rich fuel mixture, considered as a surrogate of an industrial Coke Oven Gas (COG) mixture.As in [14], the H 2 /CH 4 /CO fuel mixture composition was fixed by considering only the major components of the more complex industrial COG composition.Moreover, the relative ratios of H 2 , CH 4 and CO correspond to the ones in the fuel source mixture, and they are in accordance with those related to other COG compositions available in literature [15][16][17].
The fuel has been doped with C 6 H 6 up to 5% (vol.), in order to investigate the effects of aromatic additives in H 2 -rich fuel mixtures, mainly on NO emissions and flame radiation, in a semi-industrial furnace.
The compositions of the final six fuel blends tested are provided in Table 2, along with the equivalence ratio and the benzene doping.
This was carried out to investigate the effects of aromatics doping of the under-firing H 2 -fuel mixtures, at both constant and varying air excess.
The thermal input Q th was kept constant at 20 kW for all the investigated cases.Similarly, the flow rate of the cooling air was kept constant.Moreover, the temperature of the heated fuel pipeline, between the mixing unit and the inlet of the furnace, was controlled to keep a constant inlet fuel temperature of 130 °C, in order to avoid any benzene condensation before the entrance of the furnace.

Digital Twin Methodology
As mentioned in Section 1, the goal of this work was to develop a DT capable of predicting simultaneously different types of measurements, such as emissions, temperatures and the chemiluminescence signal.
As a first step, 30 samples of temperature, chemiluminescence and emissions were randomly selected to build the training data.The remaining six samples were used to test the model.
After that, the temperature, chemiluminescence and emissions datasets were arranged into the data matrices X T ⊂ R n T ,p , X CL ⊂ R n CL ,p and X E ⊂ R n E ,p .Each column vector contains the data for a certain operating condition, and the number of rows depends on the type of measurement.In particular, n T = 48, n CL = 898,560 and n E = 6.
Then, the POD was applied to the data matrices, which were centered and scaled to unitary variance.The POD allows for decomposing the dataset into two parts, of which one is only a function of the spatial coordinates and one is only a function of the parameters: where U T ⊂ R n T ,p and U CL ⊂ R n CL ,p represent the spatial POD modes while V T , V CL ⊂ R p,p contain the parametric coefficients.The diagonal matrices Σ T , Σ CL ⊂ R p,p contain the singular values of the POD modes.The singular values are the square root of the eigenvalues of the covariance matrix S = 1 n−1 X T X, and they reflect the amount of variance retained by each POD mode.
The threshold of 99.9% of explained variance was selected for truncation, resulting in the matrices U T,r ⊂ R n T ,r T , U CL,r ⊂ R n CL ,r CL , V T,r ⊂ R p,r T and V CL,r ⊂ R p,r CL where r T = 5 and r CL = 8.
The GPR model was applied to the global data matrix X G ⊂ R n,p , created by concatenating the matrices X E , V T and V CL such that n = n E + r T + r CL .
The GPR assumes that, in the classical regression framework, the function f (x) is a sample from a Gaussian process where m(x) and k(x, x ) are the mean and covariance function of the Gaussian process, which can be thought of as a Gaussian distribution over functions [5].In this framing, the observed and predicted data become a sample from a multivariate Gaussian distribution: where y represents the observed target values, and f * = f (X * ) is the prediction in the unexplored part of the design space X * .The covariance matrix of the multivariate Gaussian distribution is built using the kernel function K.The prior distribution p(f) and the likelihood p(y|f) are both assumed to be Gaussian.The prior represents the prior knowledge before considering the data, while the likelihood is the probability of seeing the data given the choice of the model.By employing Bayes' theorem, it is possible to calculate the posterior distribution p(f|y) given the prior and the likelihood: The predicting distribution p(f * |y) is then calculated by marginalizing the likelihood with the posterior distribution: The predicting distribution is again a Gaussian distribution p(f * |y) = N ( f * , cov(f * )) with mean and covariance: where, for compactness, K = K(X, X), K * = K(X, X * ) and K * * = K(X * , X * ).The model's hyperparameters, such as the kernel's length scale and the observations' noise, are selected by minimizing the negative log marginal likelihood: Once trained, the GPR model is able to predict the target quantity X * G in the unexplored part of the design space.In this work, the GPR has been implemented using GPyTorch [18].
The reconstruction of the temperature and chemiluminescence data are carried out by multiplying the respective rows of X * G times the POD modes U T and U CL and the singular values Σ T,r and Σ CL,r : where the notation A[i : j] represents the matrix built by considering only the rows of A between the indices i and j.
The methodology described in this section is summarized in Figure 4.

Results
The testing cases randomly selected from the complete dataset are cases 2, 8, 11, 15, 16 and 29.The remaining cases are used to train the model.The model's training takes around 10 s, while the prediction is obtained in around 1 s.
Figure A1 shows the surface response of the scalar quantities of interest as a function of the benzene doping and the equivalence ratio.The scalar quantities selected are the maximum temperature and the normalized sum of the chemiluminescence counts, along with the species concentrations.
The maximum temperature increases slightly with the benzene doping at φ < 1.0, and it reaches the maximum at φ = 0.80 and C 6 H 6 = 5.However, by increasing the equivalence ratio above the stoichiometric condition, the maximum temperature drops with the addition of benzene.
The chemiluminescence signal is strongly dependent on both the benzene doping and the equivalence ratio.Similarly to the maximum flame temperature, the intensity of both OH * and CH * chemiluminescence emissions increases with the presence of aromatics in the H 2 /CH 4 /CO flames, under lean conditions, showing a distinct maximum at φ = 0.91 and C 6 H 6 = 5.0%, whereas under stoichiometric and rich conditions the chemiluminescence intensity of both OH * and CH * radicals decreases with benzene doping levels higher than 3%.
Figure A1 shows also that the concentration of species such as NO, CO and O 2 depends mainly on the equivalence ratio, while the concentration of NO 2 , H 2 O and CO 2 depends on both the equivalence ratio and the level of benzene doping.The NO concentration drops when the fuel/air mixture goes from lean to rich, while the opposite happens for the CO concentration.
Both the H 2 O and the CO 2 concentration peaks at φ = 1.0, and it increases with the benzene doping.
Finally, the O 2 concentration depends only on the equivalence ratio.In the lean region, the oxygen concentration drops until it reaches the minimum value at stoichiometry.Then, it increases with the equivalence ration in the rich region.
In general, the surface response is nonlinear for all scalar quantities, which justifies the use of a nonlinear regression technique such as the GPR.
The constant mean function was selected to train the GPR model.The kernel function was chosen as the sum of the linear kernel and the radial basis function kernel: where v and Θ are the variance and length scale parameters.
The observed and predicted spatial distribution of the temperature and chemiluminescence signal for case 2 is reported in Figure 5, while the observed and predicted emissions are reported in Table 3.  H 2 O and O 2 .The prediction is slightly worse for NO 2 , which is also associated with the highest relative experimental uncertainty on such a species.This is due to the low NO 2 concentrations detected in the exhaust gases with respect to the total instrument error for this species detection (2 ppmv).
If we compare the experimental uncertainty reported in Table 1 with the model's uncertainty, we can observe that the model's uncertainty is higher than the experimental uncertainty for measurements such as temperature, NO, CO, H 2 O, while it is lower for NO 2 , CO 2 and O 2 .This suggests that, for those measurements in which the model's uncertainty is higher than the experimental uncertainty, the number of samples collected was not enough to converge to the experimental uncertainty.To check the quality of the model's training, a 6-fold cross-validation was performed on the complete dataset.The 36 samples were randomly divided into six groups (folds), each containing six samples.A GPR model is built for each fold.This model is tested on the data contained in the corresponding fold, while the training data are comprised of all the samples, minus the testing data.The average value of the R 2 for each fold is reported in Table 4, along with the standard deviation.
The chemiluminescence signal, OH * in particular, displays the biggest variance in the R 2 .This is because the chemiluminescence signal has a highly nonlinear behavior as a function of the equivalence ratio and benzene doping, as shown in Figure A1.In addition, the nonlinear behavior is more pronounced on the edge of the operating conditions' envelope, for C 6 H 6 = 5%.This means that the predictions are accurate when the model is interpolating, i.e., the testing conditions are inside the operating conditions' envelope.However, when the testing conditions are on the edge of the envelope, the model performs worse because it lacks important information on the behaviour of the chemiluminescence signal as a function of the benzene doping and equivalence ratio.

Discussion
The objective of this work was to develop a GPR-based DT capable of predicting the temperature, chemiluminescence and species concentration of a semi-industrial furnace fed with an H 2 -rich fuel mixture.
To integrate datasets with a vastly different spatial resolution, the POD was applied to the temperature and chemiluminescence signals.This was carried out to decouple the spatial information from the parameter-dependent information.
The GPR was selected as a regression model for its properties, mainly its nonlinearity and the ability to handle sparse datasets.
The predictions show a good level of accuracy, with a R 2 > 0.9 for all the signals except for the NO 2 concentration.However, even in this case, the maximum relative error is around 11%.
A 6-fold cross-validation was performed to judge the generalizability of the model.The results show that the prediction of temperature and species concentration is accurate for all combinations of testing conditions.For the prediction of the chemiluminescence signal, the model performs very well in interpolation, while it produces less accurate results in extrapolation.This means that care should be taken in the selection of samples for the model's training.
In conclusion, the results show that the DT can operate as a real-time surrogate of the corresponding physical object.This opens the possibility of employing the DT for redundant control, state and failure detection or data assimilation if coupled with lowfidelity measurements.

Figure 1 .
Figure 1.Vertical cross-section of the ULB semi-industrial furnace.The dashed lines represent a projection of the quartz window on the vertical cross-section plane.

Figure 2 .
Figure 2. Schematic P & I of the ULB experimental test bench.

Figure 3 .
Figure 3. Position of the sensors relative to the outline of the ULB furnace.

Figure 4 .
Figure 4. Diagram depicting the methodology used to build the Digital Twin.

Figure 5 .
Figure 5.Comparison between the predicted and observed temperature field (a); CH * field (b); and OH * field (c) for case 2.

Figure 6 .
Figure 6.Parity plot for the temperature (a); CH * signal (b); and OH * signal (c), for the six testing cases.The uncertainty represents the model's uncertainty in the prediction.The R 2 for the OH * and CH * signals has been calculated by excluding the points with a count lower than 100.

Figure 7 .
Figure 7. Parity plot for the emissions of NO (a); NO 2 (b); CO (c); H 2 O (d); CO 2 (e); and O 2 (f) for the six testing cases.The uncertainty represents the model's uncertainty in the prediction.

Figure A1 .
Figure A1.Maximum temperature (a); OH * normalized sum (b); CH * normalized sum (c); emissions of NO (d); NO 2 (e); CO (f); H 2 O (g); CO 2 (h) and O 2 (i), as a function of the input parameters.The OH * and CH * values have been obtained by summing all the counts for each condition, and they have been normalized by the maximum value.The blue dots indicate the samples used for training while the red dots represent the samples used for testing.

Table 2 .
Operating conditions of the experimental investigation of H 2 /CH 4 /CO fuel mixtures, with different levels of C 6 H 6 (%v/v) doping and equivalence ratios.

Table 3 .
Observed and predicted species emissions for case 2.

Table 4 .
Average and standard deviation of the R 2 for the ensemble of folds in the cross-validation algorithm.