Acceleration of Chemical Kinetics Computation with the Learned Intelligent Tabulation (LIT) Method

: In this work, a data-driven methodology for modeling combustion kinetics, Learned Intelligent Tabulation (LIT), is presented. LIT aims to accelerate the tabulation of combustion mechanisms via machine learning algorithms such as Deep Neural Networks (DNNs). The high-dimensional composition space is sampled from high-ﬁdelity simulations covering a wide range of initial conditions to train these DNNs. The input data are clustered into subspaces, while each subspace is trained with a DNN regression model targeted to a particular part of the high-dimensional composition space. This localized approach has proven to be more tractable than having a global ANN regression model, which fails to generalize across various composition spaces. The clustering is performed using an unsupervised method, Self-Organizing Map (SOM), which automatically subdivides the space. A dense network comprised of fully connected layers is considered for the regression model, while the network hyper parameters are optimized using Bayesian optimization. A nonlinear transformation of the parameters is used to improve sensitivity to minor species and enhance the prediction of ignition delay. The LIT method is employed to model the chemistry kinetics of zero-dimensional H 2 –O 2 and CH 4 -air combustion. The data-driven method achieves good agreement with the benchmark method while being cheaper in terms of computational cost. LIT is naturally extensible to different combustion models such as ﬂamelet and PDF transport models.


Introduction
Although the simulation of high Reynolds number combusting flow includes several computationally expensive elements, including turbulent feature prediction, the modeling of chemical kinetics is the main bottleneck. Numerical integration of reaction mechanisms mathematically must contend with an extremely stiff system of ordinary differential equations (ODEs). The stiffness of the equations is represented by the range of eigenvalues, spanning nine orders of magnitude. The difficulties caused by the stiffness are compounded by the size of the system. The number of differential equations scales with the number of chemical species, which can reach into the hundreds for complex hydrocarbon fuels. The chemical kinetics of real applications, such as n-heptane combustion, comprises more than 100 species and 1000 reactions. Furthermore, the chemical time-scale O(10 −9 ) is much smaller than the smallest flow time scale for most engines, O(10 −5 ), which makes the computation very expensive.
There have been efforts to reduce the cost of kinetics by developing reduced mechanisms and chemistry tabulation techniques. As a detailed chemical kinetic mechanism with a large size and chemical stiffness restricts the computation of real combustion applications, it is inevitable to reduce the detailed mechanism to a smaller number of reactions. There are two levels of mechanism reduction: The first level is the elimination of insignificant reactions and species (skeletal reduction) using methods such as Directed Relation Graph (DRG) [1], Rate-Controlled Constrained Equilibrium (RCCE) [2,3], and Path Flux Analysis (PFA) [4]. The second level is the consideration of the impact of timescales on the whole reaction system (global reduction) methods such as Computational Singular Perturbation (CSP) [5,6], and Intrinsic Low Dimensional Manifold (ILDM) [7]. However, reduced mechanisms result in a less reliable description of chemistry, and the level of reduction is a large challenge [8]. The tabulated chemistry technique computes a chemical mechanism with an affordable computational cost. In this approach, the chemical kinetics are solved with the interpolation of precomputed and stored values [9], which results in the reduction of real-time chemistry computation. In the in Situ Adaptive Tabulation (ISAT), the thermochemical solution can be tabulated in real-time simulation and reused in later sequences of the simulation [10,11]. The performance of the tabulation techniques strongly depends on the application as the accuracy and efficiency drop for complex turbulent combustion.
Recently, machine learning has shown promise for assisting combustion simulation. For instance, Yap et al. used Artificial Neural Networks (ANN) to control and optimize exhaust emissions [12]. Zamaniyan and his colleagues applied an ANN to model an industrial hydrogen plant [13]. There have been many efforts to use machine learning models, such as deep neural networks [14], convolutional neural networks [15], and generative adversarial networks [16] to predict sub-scale combustion features in turbulent combustion simulations. Owoyele et al. [17] studied the effect of bifurcating turbulent combustion inputs to improve regression tasks among specialized artificial neural networks. In addition, combustion researchers have introduced several data-driven tabulation methods to accelerate chemical kinetics computation. Christo et al. [18], for their first attempt, employed a deep neural network to model a 4-step H 2 -CO 2 chemical mechanism in a simulation of turbulent flame. Subsequently, Blasco et al. [19,20] applied an ANN to a reduced methane combustion mechanism including four steps and eight species. Later, they observed that it is extremely difficult to have one network modeling a complex multidimensional combustion mechanism, and expanded the method by clustering the composition space [21]. In this way, the thermochemical space is divided into multiple subspaces with a Self-Organizing Map (SOM) to have a dedicated regression network for each subspace. They applied the SOM-MLP (multilayer perceptron) method to a reduced methane combustion application with five steps. They reported good agreement with the benchmark and savings in computational time and memory usage. In the recent decade, Chatzopoulos and Rigopoulos [22] introduced an ANN model for a RCCE-reduced CH 4 mechanism trained with abstract non-premixed flamelets. Furthermore, Franke et al. [23] proposed an ANN model coupled with RCCE and CSP reduction methods to simulate the LES-PDF Sydney flame. They used an abstract problem to generate the ground-truth data. Most recently, An et al. [24] applied a similar method, called SOM-BPNN (Back-Propagation Neural Network), to model the combustion chemistry of hydrogen/hydrocarbon-fueled supersonic engines. Also, Owoyele and Pal [25] introduced a different data-driven tabulation method using a neural ODE approach, in which they have one network for each species of a H 2 -O 2 combustion.
In this work, the goal is to replace the integration of stiff chemical ODEs with a relatively fast machine learning scheme called Learned Intelligent Tabulation (LIT). We use a family of networks that self-selects during the progression through reactions. The selection process is based on unsupervised clustering as a preprocessing step using a map that is produced during training. Each network is trained using a subset of ground truth data appropriate to the local temperature and composition regime. Because chemical species concentrations span at least nine orders of magnitude and are not equally important, a nonlinear transformation of the primitive variables is used in training the network and calculating the loss function. The final algorithm is tested in a multi-dimensional CFD code. The results include the construction, accuracy, and speed-up assessments of this approach.
While there have been works in the literature to model hydrogen combustion, the LIT method captures the full hydrogen mechanism without any reduction. All species are modeled with the networks. Validation tests include the 0D kinetics test which directly evaluates the performance of the kinetics solver, without the consideration of flow dynamics such as convection and diffusion. There have been studies such as Franke et al. and Chatzopoulos et al. who applied a similar method to tabulate kinetics of CH4 combustion along with rate-controlled constrained equilibrium to reduce the full mechanism. However, for such extensive mechanisms LIT allows regression with a subset of species chosen at the user's convenience and adds heat release to the output vector to correct for the effect of neglected species. This novel approach makes LIT capable of matching the full mechanism in terms of energy solution prediction. We report on a method of decoupling the time step of the time integration with the output of the ground truth data to allow arbitrary augmentation of the training data without interpolation. Furthermore, LIT utilizes nonlinear transformation of species concentration and Bayesian optimization techniques, which substantially improve the accuracy of the method to capture the chemistry dynamics.
This article is organized as follows. The Section 2 includes the detailed description of the LIT method. The workflow and challenges are explained, which involves data generation, input clustering, DNN regression, and the integration of networks into a CFD solver. Next, Section 3 presents the validation cases, including hydrogen-oxygen and methane-air combustion. Results of LIT method are compared with several benchmarks.

Methodology
A reacting flow solver includes the computation of chemical species transport equations added to the fluid flow equations such as mass, momentum, and energy conservation. The conservation of species mass fraction Y i is formulated as where t represents time, ρ is mixture density, v v v is velocity, p is pressure, Y i is the mass fraction of species i, V V V i is the diffusion velocity of species i, andẇ i is the mass reaction rate of species i. The CFD solver computes the discretized transport equations using the finite volume method, for which the chemical source terms are required. It is noteworthy that the calculation of the reaction rates could consume up to 50% of the total computation in a reacting flow depending on the complexity of the reaction mechanism [10]. The reaction rate of species i = 1, 2, ..., N s from all chemical reactions j = 1, 2, ..., N r is formulated aṡ whereĊ ij is the rate of change of species i by the jth reaction in mole concentration and M i is the molecular weight of the ith species. The rates of concentration change are described , which is calculated over the flow time-step after solving the system of differential equations representing the chemical reactions. In general, chemical reactions are described by a system of first-order ODEs, which can be represented as where ϕ is the thermochemical composition vector including temperature T, pressure P and the vector C i of species mole fractions, and S represents the chemical reactions based on the Law of Mass Action [26]. The CFD solver loops over all the computational cells to construct and integrate the reaction ODE system. Different reactions work at different rates and time scales, which makes the system mathematically stiff and computationally expensive. Our approach is to regress this stiff system using a more expedient DNN. One of the most common applications of DNN is regression of nonlinear problems. Given that a well-generated data set is available for learning, a DNN can fit a function to model the hidden physics [27]. This capability of the DNN is utilized in this work to model the chemical kinetics and replace the ODE solver in the simulation workflow. To achieve this goal, we developed the LIT method, that is described in the following sections. The workflow of LIT is comprised of four steps, as illustrated in Figure 1.

1.
LIT starts with ground-truth generation from the reference ODE solver. Performance of the trained DNNs strongly depends on the quality of the training data and similarity of the data to the objective problem. To augment the data, we have used smaller time steps in sampling the data. As the solution time step is fixed in our validation case, the generated samples are paired according to the required solution time step. The last step in data preprocessing is data scaling to make inputs more distinguishable to the networks.

2.
Clustering is the second step after generation and preprocessing the training data.
As kinetics are such a multidimensional and multiscale problem, it is very difficult for one DNN to cover the entire combustion space. It requires a very deep and wide network to approximate the entire dynamics and leads to higher computational costs. As a result, dividing the effort among multiple DNNs improves the accuracy and lowers the cost as well. After clustering the samples into self-organizing map (SOM) bins [28], the data are mapped to a subspace grid. Each subspace includes several SOM bins.

3.
Functional mapping from input features to target vector is implemented utilizing regression DNNs for each subspace. In this work, fully connected (Dense) neural networks are used to find the functional mapping between the target to the input features. In addition, Bayesian Optimization [29] is employed to find the optimum hyper-parameters.

4.
After training the SOM network and DNNs for each subspace, C++ codes are automatically created for all networks, which are integrated with the CFD solver through an inference code. The inference is designed to be scaled to as many DNNs as required using object-oriented programming. Details of each step are described in the following sections.

Ground-Truth Data Generation
Generation of training data is a critical step in the learning of combustion kinetics. The data should be structured based on the problem definition Y i = f (X i ). For a chemical reaction, the function to approximate is the kinetics ODE system in Equation (3). Therefore, the input vector is the thermochemical state at time t, X i = ϕ(t) = [T(t), P(t), C 1 (t), C 2 (t), ..., C n (t)], and the output vector is the thermochemical composition at t + ∆t without temperature and pressure which are not direct outputs of the chemical reactions: Y i = ϕ(t + ∆t) = [C 1 (t + ∆t), C 2 (t + ∆t), ..., C n (t + ∆t)]. Note that the time step size can be added to the input vector as been done in the literature [20], however it is not considered in this work since all validation cases are simulated with a fixed time step.
We used the chemFoam solver from the OpenFOAM 5.0 framework [30], which is an open-source CFD software package implemented in C++. OpenFOAM has an extensive range of computational applications including chemical reactions and combustion. chem-Foam is a zero-dimensional solver for chemical reaction and is utilized in this study to generate the ground-truth data. Samples are created for a variety of initial conditions and simulation parameters to produce comprehensive training data.
Supervised data-driven models are being successfully used for a wide range of complex applications. However, these models are known to be very data-hungry, and their performance relies strongly on the size and quality of the training data. Therefore, it is critical to create a large dataset where the solution dt is decoupled from the dt actually used in the ground truth generation. In this work, we augment the data by running the groundtruth simulation with relatively smaller time steps and pairing the data samples according to a coarser solution time step. The actual ground truth dt controls how many data points are produced, and the interval for pairing controls the solution dt used for training. The apparent dt corresponds the to the time step anticipated in the CFD calculation. In this way, we can generate as much data as our training needs. For instance, the solution time step of the H 2 -O 2 combustion is dt s = 10 −4 s, while the data were generated with dt gt = 10 −7 . The generated data are then preprocessed to match the solution time step. In more detail, each input vector X i is paired with the correct output Y i according to the solution time step.
Primitive species concentrations represent a very challenging quantity for training neural networks, because the change in minor species concentrations is happening over very small values. The network cannot react to changes of species concentration of such small values as they are practically noise compared to the major species. Therefore, we used a 6th root transformation that makes the small changes more visible for a neural network. This helps the network to capture those critical changes. As is illustrated for a major species H 2 and minor species H in Figure 2, primitive values are unable to show small changes of concentration, which are crucial for the network to learn the ignition delay. However, the changes throughout the combustion process are more detectable with the 6th root species concentrations. Different transformations were explored and the 6th root offered the best trade-off between minor species sensitivity and robust representation of major species. Log transformations were found to be too heavily biased in favor of minor species fidelity at the expense of major species. The last step in data preprocessing is the normalization of input and output parameters. Rarely, neural networks are applied directly to the raw data without normalization. Generally, we need preparation to help the network optimization process and maximize the probability of obtaining good results. Normalizing the input and output vectors consists of mapping the data to the vector norms. Typically, data are linearly transformed to

Composition Space Clustering
The chemical system presents a complex multidimensional problem, and covering the whole dynamics with a single neural network would be quite challenging. Therefore, it is pragmatic to divide the task among several networks. To achieve this subdivision, a clustering step is added to the LIT method. In addition, the data clustering means that a smaller network with less computational cost can be used for each individual cell. As the cost of querying clustering networks is very small compared to the inference cost of DNNs, the overall computational cost drops significantly.
A self-organizing map [28] is employed to cluster the combustion composition set. SOM is an unsupervised learning technique that identifies groups of similar data. The groups are organized in a one or two-dimensional map. Different map topologies are available such as a grid, hexagonal, or random topology to arrange the clusters. One neuron is assigned to each cluster, and the Euclidian distance in the composition space is the clustering criterion. New samples are clustered into the most similar neuron with the shortest Euclidean distance [28]. In this way, SOM learns both the distribution and topology of the input composition space. The self-organizing feature map (SOFM) from the Matlab learning toolbox [31] is employed in this work for clustering.
After the SOM training is completed, the dataset is clustered into the SOM map, which would be the reference to divide the thermochemical samples. One could train one regression DNN for each SOM neuron, however, some SOM neurons do not contain enough data to train a DNN. Therefore, the SOM map is projected into a subspace map, and a DNN is trained for each subspace. An example of H 2 -O 2 evolution clustered into the SOM map and converted into a subspace map is depicted in Figure 3 (left). Furthermore, the temperature plot of the subspace data is shown in Figure 3 (right). As is shown, the unsupervised composition space clustering divides the combustion dynamics into different phases. As a consequence, we can train one DNN for each phase of the solution.

Regression Network
DNNs are capable of modeling complex nonlinear systems. The depth and width of the network architecture can be adjusted depending on the difficulty of the problem. The first layer is the input layer with the same number of neurons as the size of input vector. The output layer also has the same size as the output vector, which is the concentration of species. Several fully connected layers (hidden layers) with many neurons are considered between the input and output layers. All neurons in each layer are connected to the neurons of the neighboring layers. Each neuron calculates the weighted summation of inputs from previous layer and applies a non-linear activation function as where Y j is the jth output from fully connected layer, σ is nonlinear activation function, X i is the ith input from previous layer, W is weight, and b is bias of the neuron. In the DNN training, weight and bias values are learned and stored. We have used the Matlab Deep Learning toolbox [31] to perform the training. While the neural network implementation is fairly straightforward, the a-priori estimation of the best network hyperparameters, and the architecture itself is nontrivial due to the multiscale, non-local, and nonlinear nature of the dataset. To discover the best performing settings for our dataset, an Automatic Machine Learning (autoML) strategy is used. Among the many available optimization methods in the literature [32], we employed the Bayesian Optimization (BayesOpt) approach, which has shown good performance for other data-driven tasks ( [29,33]). The learning process for a ML algorithm such as neural networks is often stochastic in nature. This is due to the nature of the optimization solver and the loss manifold. The loss landscape/manifold for these multivariate time series problems such as in chemical kinetics are non-convex, and high-dimensional. That implies the choice of the hyperparameters used in the learning process and the design/architecture of the network has a leading order impact on the expected error and associated uncertainties of the model. In this study, we use an automatic Machine Learning (autoML) paradigm to choose optimum network hyperparameters and automate the network design process, previously introduced in [34]. The autoML uses Bayesian Optimization (BayesOpt) to navigate the range of parameters to find optimal solutions. In this study, the learning algorithm is modeled as a sample from Gaussian Process (GP). The posterior distribution induced by the GP leads to efficient use of information gathered by previous experiments, enabling optimal choices for what parameters to try next. Table 1 shows the optimization matrix, the range of each parameter and the corresponding interpolation strategies. The autoML was run for up to 100 function evaluations and the loss function was defined as the performance of the trained network on the validation dataset. Each evaluation included a network training of up to 40 epochs. At the end of training, the loss on validation dataset was stored and only the top five performing networks were retained to save disk space.
In addition, the network architecture itself is optimized. This is done by using the network width (number of layers) and depth (number of neurons in each layer) a parameter to optimize using the BayesOpt method. The depth and width are implemented as a series of repeating network blocks: fully connected layer->activation function. The solver used in all these studies is the ADAM optimizer [35].
To save time and reduce complexity of integrating different sized networks, this autoML step was carried out once for the entire dataset and the best design was chosen and re-implemented subsequently for each of the sub-spaces. The assumption was that the subspace loss manifolds are represented in the overall loss landspace and therefore a global optimization, might hold for locally smaller subspaces.

Network Model Integration
After completing the training of the SOM network and DNNs, the networks must be integrated with the CFD solver. As we are using OpenFOAM, which is implemented in C++, the trained networks are converted into C++ code for integration. To achieve this, we used a Matlab Coder tool to generate C++ code of the DNNs and SOM network along with the weight and bias binary files.
Networks are integrated into the CFD solver as a C++ library. The base class contains one SOM network and a large number of DNNs. It initializes the networks with its constructor where the weight and bias values of all layers are read from the disk. For each input, the class calls the appropriate DNN based on the SOM's response and calculates the output. The inference code receives species concentration C i (t), temperature T, and pressure P, and calculates the outputs C i (t + ∆t). Pseudocode of the inference is illustrated in Algorithm 1.

Results and Discussion
To validate LIT, we start with a zero-dimensional H 2 -O 2 reactor at constant pressure. For the hydrogen kinetics, all species are regressed using LIT without any simplification. Next, a methane-oxygen reactor is modeled, where the GRI 3.0 mechanism with 53 species is utilized for ground truth generation.
With 53 species, the methane state space is very highly dimensional, and regressing all species is very challenging. Therefore, we employed a reduced parameter space that includes five species. The selection of these five species is inspired by a one-step methaneoxygen global reaction. For both validation tests, LIT's prediction is compared with OpenFOAM ODE solver, and the gained speedup is reported.

Hydrogen-Oxygen Combustion
The first test for LIT is a zero-dimensional H 2 -O 2 reactor, which has no convection or diffusion, and the chemical kinetics is the only factor in species evolution. Therefore, this problem is an appropriate first test to evaluate the performance of LIT method. In this work, a hydrogen-oxygen reaction is considered at various initial temperatures T i and equivalence ratios φ i , while the pressure is fixed at P i = 1atm. The ground truth comes from a detailed mechanism with 10 species, namely, [H 2 , H, O 2 , O, OH, H 2 O, HO 2 , H 2 O 2 , AR, N 2 ] and 27 reactions. The training data generated for this problem include the simulation results of the reactor from initial conditions to a post-reaction final state. The initial condition is altered to vary the equivalence ratio φ i = 0.5-1.5 and initial temperature in the range of T i = 1000-1200 K. Data are generated with a smaller time step dt d = 10 −8 s to augment the training data as was explained in Section 2, while the objective simulation time step is dt s = 10 −5 s. The input vector involves the temperature and concentration of all species ϕ i (t) = [T, C i (t)], and the output of the network is the concentration of all species ϕ i (t + ∆t) = [C i (t + ∆t)]. All species including major and minor species are modeled in this work. Temperature is not used as an output because species enthalpies can be used for calculating the energy release.
The LIT configuration for this problem involves a 8 × 8 SOM map, which is projected to a 4 × 4 subspace map. Therefore, H 2 -O 2 combustion is divided into 16 regression DNNs to model. The dimensions of the SOM map and the required number of subspaces can be different for different problems. Franke et al. [23] and An et al. [24] studied the effect of SOM map configuration on the accuracy of the model. For DNN regression, we have a fully connected network with three dense layers having 25, 50, and 25 hidden neurons, respectively. Figure 4 compares the solution from the LIT method to the result from the OpenFOAM ODE solver, which solves the complete chemistry mechanism. The figure shows the results for two equivalence ratios of φ i = 0.7 and φ = 1.4, while initial temperature is fixed T i = 1100 K. The plots on the left show normalized temperature and reactants' mass fraction. The middle plots and plots on the right are the normalized mass fraction of O, OH, and H, and of H 2 O, HO 2 , and H 2 O 2 . In all plots, the ODE solver solution is plotted with lines, while LIT results are represented by symbols. According to Figure 4, LIT correctly predicts the combustion behavior for both lean and rich conditions. Prediction of temperature and reactants is in good agreement with the benchmark. Comparing different species, LIT's prediction of major species with monotonic growth including H 2 , O 2 , and H 2 O is more accurate than the minor species such as O, H, and OH.  Figure 5 plots the temporal evolution of species mass fraction at various initial temperatures T i = 1050 K and1150 K, while the equivalence ratio is set at φ = 1.1. Overall, LIT shows good accuracy in the prediction of chemical evolution for both cases. Higher temperature with faster reaction appears to be more challenging for the method. Minor species with non-monotonic dynamics such as H 2 O 2 and HO 2 proved to be relatively more difficult to capture for the DNN. Machine learning methods are known to perform well for points from the middle of the training space because of data interpolation. Consequently, a more difficult test of LIT will occur at the boundaries of the training space, where Ti = 1000 K and 1200 K, and φ = 0.5 and 1.5. Figure 6 shows the results for these edge cases, where the line plots of temporal evolution of chemical species and temperature are presented. LIT performs well at the edge of equivalence ratio range, though the results show slightly more error at the extreme temperatures. As expected, minor species tend to show more error compared to the middle of the training space, while the tendency of major species at T i = 1000 K is to slightly lag the ground truth. Figure 7 shows the estimation of ignition delay time versus equivalence ratio. Results of the full mechanism are represented by a line, and LIT results are depicted by square symbols. Ignition delay time is the time needed for a mixture of fuel and oxidizer to react and reach to the maximum rate of temperature rise. This parameter provides a benchmark of the overall behavior of a combustion. Equivalence ratio varies from lean φ = 0.5 to rich φ = 1.5, while initial temperature varies from T i = 1000 K to T i = 1150 K. Overall, LIT shows good agreement with the OpenFOAM solution.

Methane-Oxygen Combustion
The ultimate goal is to extend the LIT method to more complex fuels. Methane, a very simple hydrocarbon fuel, has a much more complicated parameter space. If we use the size and complexity of GRI 3.0 as an indicator, the methane combustion mechanism is represented by 53 species and 325 reactions [36]. The number of reactions is only relevant for generating the ground truth data, but the number of species raises an important question: how many should be represented in the LIT parameter space? To use all 53 would incur the "curse of dimensionality", where the large number of dimensions demands an impossible amount of training data. Various sizes of parameter vectors were considered, but ultimately a small size of five species worked best including CH 4 , O 2 , H 2 O, CO 2 , and N 2 . These species were chosen largely based on their appearance in the one step global mechanism CH 4 + 2O 2 => CO 2 + 2H 2 O.
The ground truth data are generated from GRI 3.0, which provides the evolution of all 53 species concentrations, though only the subset of the five species and the total generated heat from the reactionsQ are stored for training. The initial temperature is T i = 1000 K, and equivalence ratio is ϕ = 1 for this training and test. The input vector includes temperature and concentration of the five species ϕ i (t) = [T, C i (t)], and the output vector includes concentrations of species and total heat source ϕ i (t + ∆t) = [C i (t + ∆t),Q]. The heat source is required as the mechanism is simplified.  The LIT configuration for CH 4 combustion involves a 10 × 10 SOM map, which is projected to a 5 × 5 subspace map. As a result, the combustion dynamics are divided into 25 regression DNNs. For DNN regression, we have a fully connected network with 5 dense layers having 25, 50, 50, 50, and 25 hidden neurons, respectively. After the training, the accuracy of the networks is evaluated with the test data, which is 15% of the ground truth data. The mean and standard deviation of absolute error of normalized values are plotted in Figure 8. The networks deliver good accuracy in prediction of output quantities. Figure 9 compares the solution from the LIT method to the result from the ODE integrator, which solves the complete GRI 3.0 mechanism. The plot on the left shows the temperature variation. The middle plots are the mass fraction of CH 4 and O 2 , and plots on the right are the mass fraction of H 2 O and CO 2 . In all plots, the ODE solver solution is plotted with lines, while LIT results are represented by symbols. According to the results, LIT correctly predicts the temperature and concentrations compared to the benchmark. Overall, LIT shows good agreement with the ODE solution.

Speedup
Ultimately, the motivation for using the machine-learning surrogate instead of the ODEs is to provide faster results. To assess the value of the present approach, we measured the speedup between using the system of ODEs and LIT. The speedup was defined as the ratio of the ODE computational cost divided by the LIT computational cost. The assessment was performed for both hydrogen and methane reactions and is shown in Figure 10.
The results show tests with both a single computational cell and with a set of onethousand cells. The computational cost of LIT does not include time for reading the networks from the disk and allocating memory for each network, which occurs once before the solution loop starts. When applied to a single computational cell for hydrogen, the speedup is less than unity, indicating that LIT is slower than using an ODE. The main reason is that hydrogen is such a cheap mechanism with small number of reactions. Because of the extent of the GRI 3.0 mechanism, the single computational cell still benefits from the use of LIT with a roughly twenty-fold speedup.
Once the networks are instantiated, they can be used repeatedly for inference within a CFD code. The right data pair in Figure 10 shows a more realistic scenario where the LIT scheme is applied to a batch of one thousand cells. The start-up cost of LIT is now amortized over one thousand executions, providing a much more efficient deployment. The speedup with the relatively simple hydrogen chemistry is roughly eight while the speedup with the methane chemistry is over one-thousand. Because CFD simulations are expected to have on the order of one million cells, the speedup with one thousand cells probably underestimates the speedup expected in actual use.

Conclusions
In this work, we introduced a machine learning method to tabulate chemical kinetics. It was found that a two-step process provided an expedient method for regressing hydrogen/oxygen combustion. An examination of the self-organizing map showed that individual nodes of the SOM could be data-poor, and that a grouping process provided a balance between the data requirements and the fidelity gain offered by using numerous networks.
An essential part of the process was the use of a nonlinear transform so that the lower concentrations of species were still visible to the networks despite their small magnitude. Various transformations were investigated and a one-sixth root was found to give a good balance of low-concentration response and high-concentration fidelity. This transformation proved essential for accurate prediction of ignition delay.
When applied to a relatively simple reaction system, such as hydrogen/oxygen combustion, the parameter space used in the regression could include the full complement of species. However, it was found that with a much more complex system, methane, a reduced set of species offered good fidelity at low cost. Speedup for this system was found to exceed one-thousand. Natural next steps for this work would be further testing over a wide range of conditions for methane and extension to complex hydrocarbons, such as n-heptane. The present results for methane represent only a single initial condition. If these extensions are found to be successful, the LIT approach will dramatically accelerate the CFD simulation of combusting flows.