Laminar Burning Velocity Model Based on Deep Neural Network for Hydrogen and Propane with Air

: The aim of the study was to develop deep neural network models for laminar burning velocity (LBV) calculations. The present study resulted in models for hydrogen–air and propane–air mixtures. An original data-preparation/data-generation algorithm was also developed in order to obtain the datasets sufﬁcient in quality and quantity for models training. The discussion about the current analytical models highlighted issues with both experimental data and methodology of creating those analytical models. It was concluded that there is a need for models that can capture data from multiple experimental techniques with ease and automate the model design and training process. We presented a full machine learning based approach that fulﬁlls these requirements. Not only model development, but also data preparation was described in detail as it is crucial in obtaining good results. Resulting models calculations were compared with popular analytical models and experimental data gathered from literature. The calculations comparison showed that the models developed were characterized by the smallest error with regards to the experiments and behaved equally well for variable pressure, temperature, and equivalence ratio. The source code of ready-to-use models has been provided and can be easily integrated in, for example, CFD software.


Introduction and Motivation
Laminar burning velocity (LBV) of a fuel is one of its most fundamental and important properties. Due to this, a lot of research has been conducted with the LBV as the main focus of interest. This fuel property is known to be directly influenced by exothermicity, reactivity, and diffusivity of the fuel [1][2][3][4]. Additionally, LBV is frequently used in describing fuel combustion phenomena, like turbulent flame structure and speed or flame extinction and stabilization [5,6]. Moreover, measurements of LBV are important and commonly applied in validation of chemical kinetic models [7]. Another branch of LBV usages is CFD. Combustion models often depend on flame speeds, which are divided into laminar and turbulent. In most models, turbulent flame speed is directly dependent on LBV (and proportional to flow fluctuations), thus accurate values of LBV are essential [8].
Machine learning (ML) is a group of algorithms that allow computers to learn, recognize patterns in data, and predict outcomes in previously unseen conditions. Recently, ML has been employed by a remarkable number of researchers across the world. Popularity of ML is caused by multiple factors that have occurred in recent years, as ML and its most advanced branch, artificial neural networks (ANN), were created many years ago, but were not widely used in the beginning. First concepts of a "perceptron" (a basic building block of ANN, now called "neuron") were proposed as early as 1963 by White and Rosenblatt [9], but the developments in the ANN were hampered due to the lack of efficient algorithms to train the models and the shortage of computation power required. The idea for a proper for various mixtures, but also rendered it increasingly complicated to use, change, and implement. Furthermore, the sole process of fitting such complicated equations caused numerous difficulties.
Recently, machine learning approaches have been applied in the LBV calculation. For instance, Jach [27], coauthored by the authors of this work, tackled the development of ANN model for predicting LBV of methane-air mixtures. She created a multivariate regression model, Support Vector Machines (SVM), and an artificial neural network (ANN), then compared results with experimental values and calculations from the San Diego reaction mechanism. Inputs to the models were as follows: pressure, temperature, and equivalence ratio; while the output was the value of LBV. The work also included description of the data preprocessing procedure and overall methodology of model preparation. Results obtained from the ANN model were very promising. Jach et al. built on that knowledge in her next article, where she created multiple models for single-fuel mixtures of normal hydrocarbons C1-C7 with air [28]. Once again, she created multivariate regression models, SVMs, and ANNs, and compared results with experimental values and multiple reaction mechanisms. The accuracy of the models was satisfactory. Moreover, she noted that calculations based on reaction mechanisms not only were less accurate than ANN, but also their individual computational time exceeded 5 hours for all the samples investigated. For ANN, it was a matter of seconds to calculate all the samples. This shows the potential of using ANNs in CFD software, where it is much desired to shorten the simulation time as much as possible while keeping high accuracy. In his study, Mehra et al. [29] performed experimental investigations of hydrogen and carbon-monoxide-enriched natural gas mixtures at room temperature and atmospheric pressure. He analyzed three different equivalence ratios, three different hydrogen blends, and five different CO blends. In addition to obtaining the experimental results, he created an ANN for predicting LBV under such conditions. Inputs to his model were as follows: equivalence ratio, hydrogen fraction, methane fraction, and carbon monoxide fraction. In his work, he evaluated multiple ANN architectures, changing training algorithms and the number of neurons. In the end, the obtained model showed high accuracy in predicting LBV for multiple conditions that were experimentally measured in the mentioned work. These examples show that machine learning is used in LBV applications with increasing frequency. It also shows that the methodology and technology is mature enough to be used in such applications, as the results obtained in those works show that machine learning approaches are distinguished by high accuracy and performance.
Another reason why the authors of this work claim that ANNs are a suitable fit for LBV calculation is connected to the known problems with experimental measurements, not only LBV. The examples of why this is the case may be found in a recent article by Walter et al. [30], in which the authors explained that experimental LBV data is characterized by a large scatter in measurements. This scatter is not only introduced by the use of various experimental methods. It is also observed when the same method is employed, but with different approaches or in different laboratories. What is more, researchers often are reluctant to provide detailed analysis of uncertainties in their experiments. Often, one needs to search for them in nested references, sometimes even outdated ones. Walter et al. assessed the four most common measurement methods: Heat Flux Method, Bunsen Flame Method, Spherical Flame Method, and Counter Flow Method. They determined approximate contributions of individual factors, unique for each method, to the uncertainty of the conducted experiments. After reading their work, the conclusion is that from method to method and even from laboratory to laboratory, experiments produce different results. Common errors introduced in all methods were within 5-6%. In addition to that, method-specific errors need to be included. These vary from 1.5% to 4%. Based on the above, it is very challenging to determine which measurement is satisfactory, which is inadequate, and what is the true, quantitative difference from the reality. In order to have acceptable models of LBV, we need to have a methodology and algorithms that are able to capture a variety of experimental data and can account for (sometimes significant) differences in multiple measurements for the same mixture and conditions. Authors of this article propose that an ANN may serve as such a model. Neural networks are known for being able to learn from a variety of different data, recognize hidden patterns, and adapt accordingly. The Universal Approximation Theorem [31,32] explains this in a formal way.
In this article, a new approach for the data preparation and calculation of laminar flame speeds based on machine learning is proposed. Authors argue that this approach can better capture deviations and irregularities introduced by experimental measurements of LBV mentioned in previous paragraphs and provide a useful model to implement in CFD codes.

Experimental Data Sources
Experimental data from multiple publications was collected to be used during model training and model performance evaluation. Authors focused on gathering experiments for hydrogen-air and propane-air mixtures. Experiments needed to be clearly described and cover a wide array of conditions (variable equivalence ratio, pressure, and temperature). An additional advantage was that when a publication gathered multiple other data sources, such data was digitized and included in this work. Discrepancies in experiments were in fact desired, as they introduce variability in data and make the model more robust. That is why experiments conducted using different methods were included.

Hydrogen-Air
Experimental data of LBV measurements for hydrogen-air mixtures was taken from four different publications and their references. Dahoe [33] conducted experiments on measuring LBV using pressure variations in a windowless vessel. In his work, he investigated the equivalence ratio of the mixture between 0.5 and 3.0. Initial conditions of temperature and pressure were kept constant. Additionally, he referenced multiple other publications, which also included one that considered changes of initial pressure and temperature. Those referenced publications were also taken into account while gathering the data for this work. The next batch of experimental data was also prepared based on changing the equivalence ratio of hydrogen-air mixture. Results were taken form multiple publications: works by Tse et al. [34], Dowdy et al. [35], Egolfopoulos and Law [36], Aung et al. [37], and Kwon and Faeth [38]. Kuznetsov et al. [39] investigated LBV values of hydrogen-air mixtures at subatmospheric pressures (1 bar-200 mbar) and elevated temperature (up to 300°C). Experiments were performed in a spherical explosion bomb equipped with quartz windows. They also referenced other experimental works, the results of which were also included in the prepared dataset. The last source of hydrogen-air related data was the publication by Pareja et al. [40]. They conducted experiments using particle-tracking velocimetry approach and combining it with Schlieren photography. They investigated equivalence ratios from 0.8 to 3.0. Their work contained multiple other references, which were included as well.

Propane-Air
For propane-air mixtures, 3 main publications were taken into consideration. Similarly, as with hydrogen-air mixture, the data from references of these works was also included. Ebaid et al. [41] performed a series of LBV measurements on a built experimental setup consisting of K-type thermocouples in a cylindrical vessel. They investigated 3 different initial pressures (0.5, 1, and 1.5 bar) and temperatures (300, 325, and 350 K) over a variable equivalence ratio (0.6-1.5). In their work, they referenced a couple of other publications, the data from which was included in the dataset of this article. A batch of experimental data was obtained from 2 articles by Vagelopoulos et al. [42,43], in which LBV for a wide array of equivalence ratios was measured. Finally, a work by [18], which gathered multiple data sources for multiple mixtures of fuels with air, including propane, was used in the present article to digitalize all the data referenced by them. Their aim was to construct LBV correlations, but apart from that, the study proved very useful in collecting the data from multiple articles in one place.

Data Preprocessing
An essential part of every ML model preparation process is the preprocessing of data. It is known that ML models are just as good as the data that is used to train them [44]. There are two main requirements for the data to be useful to train ML models, quantity and quality. ML models need large datasets in order to be trained well. A small training dataset will very often result in a model that is not capable of generalization when fed with new data for predictions. In order to avoid this, new data needs to be gathered or simulated. This is especially true for ANNs, which require significantly more data for proper training than simpler algorithms, such as Support Vector Machines or Random Forest. This is caused by the complex nature of ANNs and the number of variable parameters to train. Quality of the data can be checked and ensured in many different ways and is a challenge itself. Fundamental methods include checking data for outliers and missing values, then imputing or removing these from the dataset. These cover the algorithms that fill them with the median value, the mean value, last known value, or interpolate data between two nearest points. More sophisticated approaches include such algorithms as Singular Value Decomposition [45], K-Nearest Neighbors [46], or Deep Neural Network approaches (DNN) [47], where DNN is just a Neural Network with more than 1 hidden layer. Taking the above properties and statements into consideration, the present work put emphasis on data preprocessing. An original data generation algorithm was developed, one that is based on experimental data and standard deviation of the measurements. The algorithm steps that were followed to obtain a proper and useful dataset to train the models on are listed below: • Removal of the outliers and using linear interpolation to fill those values; • Rounding of data (precision of 10 for temperature (K), 0.01 for pressure (bar) and 0.01 for equivalence ratio (fraction)); • Approximation of experimental data (over each variable separately); • Approximation of the standard deviation of experimental data (over each variable separately); • Data simulation (over each variable separately).
Firstly, a standard outlier-detection-algorithm was used, which deleted entries that differed by more than three standard deviations from the mean value. Those values were replaced by a linear interpolation of two nearest values in the dataset. Then, the data needed to be rounded. This is caused by the fact that a common basis for variability of different parameters was needed. Consequently, if the temperature was, for example, 290, 293, and 300 K, and pressure 0.250 and 0.300 bar, usually no more than two or three measurements for different values of equivalence ratio for these conditions were available. Moreover, the values needed rounding because of the way they were digitalized from the articles (by clicking on the plots, and automatically reading clicked values with high precision), which, for example, resulted in multiple values very close to 293 K, as opposed to a couple of entries for exactly this temperature. During the analysis of experimental data from the referenced articles, we found that the number of samples from each separate source is usually not larger than 20-30 measurements. Even summing these samples over multiple articles, the overall number of samples was definitely insufficient to be utilized as training dataset for a neural network. It was necessary to conduct data simulation in order to increase the number of samples. The procedure we followed was as follows: In the preprocessed experimental data (as described above), find all unique pairs of 2 variables (unique pairs of p and T, of p and φ, and of φ and T). For each unique pair, find unique values of the 3rd variable. For example, having p = const and T = const, the result is LBV = f (φ). For each of these relations, a polynomial regression model was fitted to represent mean values in this collection of measurements. Powers of the polynomial were as follows: p = 2, T = 1, φ = 2. Additional models were fitted for standard deviation (SD) of the measurements. This information was usually not available, but the result of digitalizing the data was that multiple values for the same parameter were often present. Based on that, we made the assumption that SD could be approximated using formula (1), commonly known as the range rule of thumb [48].
Consequently, the resulting polynomial approximations were used to perform data simulation. In one simulation step, it was assumed that LBV followed the Normal Distribution for every given changing variable, while keeping the other 2 constant. The LBV probability density function (PDF) is described with the following formula (2): where µ (mean) and σ (standard deviation) were calculated using the polynomial approximations of mean value in the experiments and SD values, respectively. A total of 7300 points for hydrogen-air mixture and 5200 points for propane-air mixture was randomly sampled from an individual PDF for every distinct value of LBV. An exemplary visualization of the generated data is shown in Figures 1 and 2.
Generated data was used as a training dataset for models. The test set was the raw experimental data.

Model Construction, Tuning, and Training
The two models were built with the following assumptions about their architectures: • Initial conditions: 1 atm, 300 K; • Three inputs: pressure (bar), temperature (K), and equivalence ratio (fractional form); • One output: LBV (m/s); • Hidden layers activation functions: hyperbolic tangent; • Output layer activation function: identity.
The exact model structure, including the number of hidden layers, the number of nodes in each layer, and the L2 regularization parameter [49], was derived by means of a hyperparameter grid search [50] using the following constraints: • The number of hidden layers ranging from 1 to 4; • The number of nodes (equal in every hidden layer) between 2 and 5; • The regularization parameter: 10 −3 , 10 −4 , 10 −5 , 10 −6 , 10 −7 , or 0.
In order to develop the models, we used R programming language and FCNN4R package [51]. Using a 5-fold cross-validation [52], all combinations of described hyperparameters were evaluated on the training set. Cross-validation is a method of subsequently dividing (folding) the data into two parts, then training the algorithm on one of the parts and evaluating its performance on the other. The overall performance is usually a mean value calculated from all the folds. This technique helps to prevent overfitting, which can be noticed when algorithms produce good results on the data they were trained on (or very similar data), but they do not generalize and perform poorly on the data not included in training [44]. We used Rprop [53] algorithm to optimize the ANN weights. It was set to minimize the mean squared error (MSE) between the training dataset and predictions. Data scaling to the range of [−1,1] was applied before model tuning and training, with the aim to reduce the training time and help the algorithm to converge. The result of the model tuning phase was the following optimal hyperparameters: 3 hidden layers with 4 nodes each and the L2 regularization coefficient equal to 10 −7 . Based on the fact that the optimal architecture of both models resulted in the same setup, it can be assumed that in case of applying such modeling for different mixtures, a good choice of the architecture would be the one considered in this work. The final model visualization is presented in Figure 3.

Model Validation
Subsequently to model tuning and training, each model's predictions were compared to the raw experimental data for the corresponding mixture. Table 1 shows the metrics which outline the performance of the models. R-squared (R 2 ) is an indicator that shows the variability and quality of the fit between model and data. The closer to 1, the better [54]. In addition to the models' performance, analytical formulas were tested for reference metrics. Used formulas were chosen based on the ones that can be found in ddtFoam [55], an open-source Deflagration-to-Detonation solver for OpenFoam software [56], as the aim is to implement the developed LBV models in a modified solver. For hydrogen-air mixture, we used Ardey's [57] formula (3) as a reference: where s L0 -LBV of hydrogen-air mixture in reference conditions; y H 2 -mass fraction of hydrogen in the mixture; s L -LBV of hydrogen-air mixture in given conditions; T re f -reference temperature, 298 K; p re f -reference pressure, 1.013 · 10 5 Pa; p -pressure the LBV is calculated for (Pa); T -temperature the LBV is calculated for (K); α -mixture-specific constant, 1.75; β -mixture-specific constant, −0.2; while for propane-air, we used Gülder's [20] formula (4): where s L0 -LBV of hydrogen-air mixture in reference conditions; φ -equivalence ratio of propane-air mixture; s L -LBV of hydrogen-air mixture in given conditions; T re f -reference temperature, 298 K; p re f -reference pressure, 1.013 · 10 5 Pa; p -pressure the LBV is calculated for (Pa); T -temperature the LBV is calculated for (K); α -mixture-specific constant, 1.624; β -mixture-specific constant, −0.301. Table 1 shows that ANN model performed well and was clearly better at calculating LBV than the exemplary analytical models. R-squared for both mixtures was above 0.9 (0.9192 and 0.9755 for hydrogen-air and propane-air, respectively), which indicates adequate fix and capturing of the variability in the data. When investigating mean absolute percentage error (MAPE), one can notice that for both models they are very similar (15. 69 and 15.89). This, along with the high R-squared values, indicates that models work equally well for both mixtures and are not dependent on the fuel. MAPE of the order of 15% looks like a good place to improve the models, but one needs to notice the mentioned high-variability and high-spread of experimental data. Based on that, such errors are expected. Looking into the performance of the analytical models, it may be noticed that MAPE of the propane-air formula is actually lower than that of the ANN model. Additionally, other metrics like R-squared and mean absolute error look very good for this formula. However, it needs to be noted that a lower MAPE for the formula resulted in a lower (worse) R-squared. This indicates that there is always a trade-off in how models fit the data, and that above a certain threshold, it is hard to continue improving without sacrificing other properties. The ANNs performed well in optimizing the fit and making it adequate in every case. The supremacy of ANN models may be clearly seen in Figures 4-7, which show comparisons of experimental vs. calculated values for hydrogen-air analytical formula and ANN, and propane-air formula and ANN, respectively. The closer the points to the diagonal (the dotted line), the better the fit. Especially in Figure 6, it can be observed that the fit for the formula for propane-air is acceptable for small velocities; however, for the larger ones (above approximately 1 m/s), the formula highly underestimates the LBV values. Such spread and inconsistency are not observable in Figure 7, which represents the fit of the ANN model for propane-air. Here, errors are evenly distributed across the values of the LBV. For hydrogen, Figure 4 shows high error and inadequate fit of the formula, which confirms numbers found in Table 1. The formula highly overestimates the LBV for this mixture, while Figure 5 clearly shows a much better fit. Obviously, some spread can also be seen, but as mentioned before, this spread is to be associated with the high variability, uncertainty, and the nature of the experimental measurements of LBV.  Figures 11-13, which present analogous plots but for propane-air mixture. Importantly, as shown by Table 1 and previous paragraphs, the analytical formula performs noticeably better in comparison with the experimental data than its counterpart for hydrogen-air mixture.          Exemplary comparison of models predictions, analytical formula calculations, and experiments for variable temperature, constant pressure of 1 bar, and constant equivalence ratio of 1. C 3 H 8 -air mixture.

Conclusions
In this work, multiple deep neural network models for predicting LBV values for hydrogen-air and propane-air mixtures were developed and validated. Additionally, the data preparation process was described in detail as authors believe it was crucial in achieving good performance of the models considered in this work. This process can be further enhanced or even adapted for the use in other modeling tasks that heavily depend on the experimental data. The models' inputs were pressure (bar), equivalence ratio, and temperature (K). Based on the provided values, models were able to calculate LBV (m/s) values with high accuracy for a wide range of input values. Main metrics: R-squared and MAPE showed an adequate fit and small error, taking into consideration the issues connected with experimental values of LBV mentioned in the "Introduction and Motivation" section. Moreover, the results of the models were compared with exemplary analytical formulas for calculating LBV that are widely used in CFD codes. The developed model outperformed the analytical formulas in how well they fit the data. The analytical formula for propane-air LBV values also resulted in a good performance, achieving smaller MAPE than the ANN model. However, as could be noticed in the plots later on, it sacrificed the fit to the data, especially for the larger LBV values, which proved that ANNs can asses those kinds of situations and adapt accordingly, resulting in the most optimal fit. The successful development of both models demonstrated that ANNs can be used to model LBV of many mixtures, not only those considered in this work. It was shown that the cited publications also prove this statement. It occurred that the only requirement for developing such models is the availability of the data. Summing up, the ANNs predictions phase is fast enough to be integrated into CFD code and used instead of the analytical formulas to achieve higher accuracy of LBV-based models without sacrificing a noticeable amount of the simulation time.
Models developed in this work can be accessed via the link provided in the supplementary materials.