Reduced models in chemical kinetics via nonlinear data-mining

The adoption of detailed mechanisms for chemical kinetics often poses two types of severe challenges: First, the number of degrees of freedom is large; and second, the dynamics is characterized by widely disparate time scales. As a result, reactive flow solvers with detailed chemistry often become intractable even for large clusters of CPUs, especially when dealing with direct numerical simulation (DNS) of turbulent combustion problems. This has motivated the development of several techniques for reducing the complexity of such kinetics models, where eventually only a few variables are considered in the development of the simplified model. Unfortunately, no generally applicable a priori recipe for selecting suitable parameterizations of the reduced model is available, and the choice of slow variables often relies upon intuition and experience. We present an automated approach to this task, consisting of three main steps. First, the low dimensional manifold of slow motions is (approximately) sampled by brief simulations of the detailed model, starting from a rich enough ensemble of admissible initial conditions. Second, a global parameterization of the manifold is obtained through the Diffusion Map (DMAP) approach, which has recently emerged as a powerful tool in data analysis/machine learning. Finally, a simplified model is constructed and solved on the fly in terms of the above reduced (slow) variables. Clearly, closing this latter model requires nontrivial interpolation calculations, enabling restriction (mapping from the full ambient space to the reduced one) and lifting (mapping from the reduced space to the ambient one). This is a key step in our approach, and a variety of interpolation schemes are reported and compared. The scope of the proposed procedure is presented and discussed by means of an illustrative combustion example.


Introduction
The solution of detailed models for chemical kinetics is often dramatically time The Diffusion Map (DMAP) approach has emerged as a powerful tool in data analysis and dimension reduction [7][8][9]. In effect, it can be thought of as a nonlinear counterpart of Principal Component Analysis (PCA) [10] that can be used to search for a low-dimensional embedding of a high-dimensional point set {y 1 ,...,y M }, if an embedding exists. For completeness, we present a simple description of the DMAP process. The points y i could exist in some n-dimensional Cartesian space (as they are in our combustion example) or they could be more abstract objects, such as images. What is important is that there exists a dissimilarity function, d ij = d ji between any pair of points, y i and y j such that the dissimilarity is zero only if the points are identical (in those aspects that are important to the study) and gets larger the more dissimilar they are. Although, for points in n , an obvious choice for d ij is the standard Euclidean distance, this is not necessarily the best option. For instance, a weighted Euclidean norm may be considered when different coordinates are characterized by disparate orders of magnitude. As discussed below, this is indeed the case encountered in many combustion problems, where the data are composition vectors in concentration space and major species (i.e. reactants and products) are characterized by much higher concentrations compared to minor species (i.e. radicals). From d ij a pairwise affinity function w ij = w(d ij ) is computed where w(0) = 1 and w(d) is monotonically decreasing and non-negative for d > 0. A popular option is the heat kernel (1) The model parameter ε specifies the level below which points are considered similar, whereas points 74 more distant than a small multiple of ε are, effectively, not linked directly. For this presentation we will 75 assume that d is a distance measure in (suitably scaled) Cartesian coordinates so that each point, y i is 76 specified by its coordinates, y i,α with α = 1, ..., n in n-dimensional space.
In the DMAP approach, starting from the M ×M (not n×n as in PCA) symmetric matrix W = {w ij }, a Markov matrix K is constructed through the row normalization with the diagonal matrix D collecting all the row sums of matrix W . Owing to similarity with a symmetric matrix, D −1/2 W D −1/2 , K has a complete set of real eigenvectors {φ i } and eigenvalues {λ i }. Moreover, a projection of the high-dimensional points {y 1 ,...,y M } into an m-dimensional space (hopefully m << n) can be established through the components of m appropriately selected eigenvectors (not necessarily the m leading ones, as in PCA). Specifically, let the eigenvalues be sorted in decreasing order: 1 = λ 1 ≥ |λ 2 | ≥ ... ≥ |λ M |. The diffusion map Ψ t is defined based on the right eigenvectors of K, Kφ l = λ l φ l , with φ l = (φ 1,l , ..., φ M,l ), for t > 0, as follows: and it assigns a vector of M new coordinates to each data point y i . Notice that all points have the same first coordinate in (3), since φ 1 is proportional to the all-ones vector (with eigenvalue 1). Notice that the diffusion map coordinates are time-dependent; using longer times in the diffusion process damps high frequency components, so that fewer coordinates suffice for an approximation of a given accuracy. However, in order to achieve a drastic dimension reduction, for a fixed threshold 0 < δ < 1, it is convenient to define a truncated diffusion map: where m + 1 is the largest integer for which |λ m+1 | t > δ. Below we will consider only the eigenvector 78 entries (i.e. take t = 0), and will separately discuss using the eigenvalues (and their powers) to ignore 79 noise.

80
If the initial data points {y 1 ,...,y M } are located on a (possibly non-linear) low dimensional manifold 81 with dimension m, one might expect (by analogy to PCA) that a procedure exists to systematically select 82 m diffusion map eigenvectors for embedding the data.

83
If the points are fairly evenly distributed across the low-dimensional manifold, it is known that the principal directions of the manifold are spanned by some of the leading eigenvectors (i.e., those corresponding to larger eigenvalues) of the DMAP operator and the corresponding eigenvalues are approximately where δ ≈ exp(−d/ε 2 ), d is the typical spacing between neighbors, and L α is the length of the 84 α-th principal direction. Here k = 1, 2, · · · indicates the successive harmonics of the eigenvectors.
(This approximation can be obtained by considering the regularly-spaced data case, assuming that ε is 86 comparable to d, and that δ is small enough that higher powers can be ignored.) Section 2.1 below 87 discusses how to ignore eigenvectors that are harmonics of previous ones by checking for dependence.

88
Eq. (5) provides a tool for deciding when to ignore the smaller eigenvalues. Suppose, for example, that 89 we know that our data accuracy is approximately a fraction γ of the range of the data. This range roughly 90 corresponds to the longest principal direction, say L 1 . There is little point in considering manifold 91 directions of the order of γL 1 , since they are of the order of the errors in the data. Hence by applying 92 (5) we should ignore any eigendirections whose eigenvalue is less than stretched and wrapped around three fourths of a cylinder of radius 1 and length 2 (see Fig. 1(a)). In one-dimensional nature of the plot confirms that ψ 2 parametrizes this principal geometric direction.

102
However, a plot of the ψ 3 , the eigenvector corresponding to the next leading eigenvalue, against ψ 2 103 clearly shows a strong correlation: ψ 3 is not representative of a new, indepedent direction on the data 104 manifold. In Fig. 1(d), the two-dimensional scatter of the plot of the entries of the fourth eigenvector 105 versus the entries of the second one indicates independence between ψ 2 and ψ 4 ; ψ 4 does represent a 106 new, independent direction along the data manifold and becomes our second embedding coordinate.

107
Visually testing independence between two DMAP eigenvectors is relatively easy: we can agree that systems can also find use here [11,12].

117
Irregularity of sample points can be easily seen to lead to problems in this simple example. Consider While the first non-trivial eigenvector ψ 2 always characterizes the principal direction on the manifold, no general recipe can be formulated for an a priori identification of the subsequent uncorrelated eigenvectors parameterizing other dimensions. We have already seen that eigenvectors in (3) are often dependent; this implies that they do not encode new directions along the data manifold; in this sense, they are redundant for our embedding. In order to obtain more insight in eigenvector dependency (and, in other words, in how diffusion is linked with manifold parametrization), consider, as our domain of interest, a narrow two-dimensional stripe -or, in our case, data points densely sampled from it. Fig.  4(a) reports the solution to the discretized (through the finite element method, FEM) eigenvalue problem ∇ 2 φ = λφ with Neumann boundary conditions. The first non-trivial eigenfunction is analytically given by cos(x) wherex denotes the horizontal space direction, and is very well approximated by the FEM numerics; the point to notice is that cos(x) is one-to-one withx between 0 and 2π; so the first nontrivial diffusion eigenvector parameterizes one manifold direction (thex). Several subsequent eigenfunctions still correlate with thex direction: they are simply higher harmonics (cos(2x), cos(3x),...). We have to go as high as the seventh eigenfunction (which analytically is cos(ȳ)) to find something that is one-to-one with the second, independent, vertical directionȳ (see Fig. 4(b) where the first non-trivial eigenfunction is plotted against both the fourth and seventh eigenfunction at scattered locations). A more complex two dimensional geometry is considered in Fig. 4(c). Similarly to the above example, the first non-trivial eigenfunction parameterizes one of the manifold "principal dimensions" (the angular coordinate) , while the next (seventh) uncorrelated eigenfunction can be used to parameterize the other relevant (radial) coordinate (it is just an accident that we had to go to seventh eigenfunction in both cases). In practical applications, only a discrete set of sample points on the manifold in question is available as an input. Starting from those points, the Diffusion Maps create a graph, where the points are the graph nodes and the edges are weighted on the basis of point distances, as described above. Noticing that the (negatively defined) normalized graph Laplacian L is given by [13]: with I being the M × M identity matrix, we immediately recognize the link between the eigenvalue  3. The proposed approach We demonstrate the feasibility of constructing reduced kinetics models for combustion applications, by extracting the slow dynamics on a manifold globally parameterized by a truncated diffusion map. We focus on spatially homogeneous reactive mixtures of ideal gases under fixed total enthalpy H and pressure P . Such a set-up is relevant for building up tables to be used in reactive flow solvers in the low Mach number regime. In such systems, a complex reaction occurs with n chemical species {A 1 ,...,A n } and d elements involved in a (typically) large number, r of elementary steps: where α sp and β sp represent the stoichiometric coefficients of the p-th species in the s-th step. Time evolution of chemical species can be modeled by a system of ordinary differential equations (ODEs) cast in the general form:  3.1. Data collection To start the procedure, we need an ensemble of representative data points on (close to) the manifold we wish to parametrize. To ensure good sampling, our ensemble of points comes from integrating Eqs. (8) starting from a (rich enough) set of random states within the admissible phasespace (a convex polytope defined by elemental conservation constraints and concentration positivity). After sufficient time to approach a neighborhood of the manifold, samples are collected from each such trajectory. As a result, a set of points {y i , i = 1, ..., M } in n (hopefully dense enough within the region of interest) becomes available for defining the manifold. To construct the required initial conditions we first search for all vertices of the convex polytope defined by a set of equalities and inequalities as follows: where c αβ andW α denote the number of atoms of the β-th element in the species α and the molecular weight of species α, respectively, while the state vector y = (y 1 , ..., y n ) expresses species concentration in terms of mass fractions. Selection of random initial conditions is performed by convex linear combinations of the v polytope vertices {y pol i }: with {w i } being a set of v random weights such as v i=1w i = 1. Clearly, owing to convexity, equation (10) always provides states within the admissible space. In combustion applications, the phase-space region of interest goes from the fresh mixture conditions to the thermodynamic equilibrium y eq , hence in eq. (10) we consider a subset of the polytope vertices {y pol i } based on their vicinity (in the sense specified in the Appendix) to the mean point of the mixing line connecting the fresh mixture point to equilibrium. It is worth noticing that, upon the choice of v random numbers {w i , i = 1, ..., v} uniformly distributed over the range 0 ≤w i ≤ 1, weights might be straightforwardly obtained by normalization: However, such an approach leads to poor sampling in the vicinity of the polytope edges and, at the same time, to oversampling within its interior. Therefore, in order to achieve a more uniform sampling in the whole phase-space region of interest, the weights are chosen as follows: with z i representing random values uniformly distributed within the interval 0 ≤ z i ≤ 1, and 1 ≤ p ≤ 2  1 A better choice for ε is to make it a multiple of what we will call the critical diffusion distance: the maximum edge length such that, if all edges of at least that length are deleted in the distance graph, the graph becomes disconnected. The reason this distance is important is that if ε is much smaller than this, the diffusion map will find disjoint sets.
We will also assume that we can construct a mapping in the other direction, u = ψ(y) where u i = ψ(y i ) for all i. Finally, in the third step, we need to conceptually recast (8), which has the form dy/dt = f (y), into the reduced space as: In other words, given a value of u we need a computational method to evaluate g(u), yet all we have 163 available is a method to compute f (y). To do this we have to execute the following three substeps:

168
Since Ω c is only an approximation to Ω, it is highly unlikely that dy/dt lies in the tangent plane of Ω c at the point y. (If it did the problem of computing an equivalent du/dt would be straightforward.) Two possible solutions to this dilemma are (i) project dy/dt onto the tangent plane, or (ii) extend the mapping u = ψ(y) to include a neighborhood of Ω c (a many-to-one map). If we do the latter, we can write: These two approaches are really the same, since a local extension of ψ to a neighborhood on Ω c implies a local foliation and (13) is simply a projection along that foliation. If an orthogonal projection is used, we simply write: where J = ∂Θ ∂u and ψ(y) is possibly needed only for initializing (13) in case initial conditions are 169 available in the ambient space.   An established procedure for obtaining the α-th DMAP coordinate ψ α at an arbitrary state y ∈ n is the popular Nyström extension [20] : where λ α and (ψ 1,α , ..., ψ M,α ) are the α-th eigenvalue and eigenvector of the Markov matrix K, respectively. For the combustion case below, the d i denote the Euclidean distances between rescaled points as discussed in the Appendix (y = Ry, y i = Ry i ). The Jacobian matrix at the right-hand side of (13) can be obtained by differentiation of (15) as follows [18,21]: where, in case of point rescaling, r ββ is computed as indicated in the Appendix, otherwise r ββ = 1, ∀β.

175
The Nyström estension can be utilized for implementing the restriction operator, as well as for computing 176 its Jacobian matrix.

178
Both lifting and restriction operators may be also obtained by local interpolation through radial basis functions. Let u be a new state in the reduced space; the corresponding point in the full space y = Θ (u) can be generally expressed by the following summation: over the nn nearest neighbors of u with the radial functionφ(•) only depending on a distance • . In this work, we focus on the following special form of (17): where p is an odd integer while • denotes the usual Euclidean distance in the reduced space. The coefficients α i,β are computed as: Similarly, restriction can be expressed in the form: where data in the full space have been possibly rescaled as discussed in the Appendix (y = Ry, y i = Ry i ). The Jacobian matrix at the right-hand side of (13) can be obtained by differentiation of (20) as follows: and the differences d are updated at each level l. The functions k l in (22) are: In Eqs. (24), a Gaussian kernel is chosen where the parameter σ l = σ 0 /2 l decreases with the level l, σ 0 is the fixed coarsest scale, while y i and y denote the rescaled states as specified in the Appendix (y = Ry, y i = Ry i ). A maximum admissible error can be set a priori, and the values s  (22) and (23), while Euclidean distances in the reduced space are adopted for the kernel in (24). Based on the resemblance of Eqs. (22) with the Nyström extension (15), it follows that: and the Jacobian at the right-hand side of (13) given by Similarly to RBF, LP can be applied to a subset of the sample points where, in the above procedure,   198 This is an alternative multi-scale scheme for extending functions only available at M scattered locations, inspired by the Nyström method, and making use of a kernel w [25]. Let W be the symmetric (M × M ) matrix, whose generic element reads as: with {φ α=1,...,M } being its full set of orthonormal eigenvectors sorted according to descending eigenvalues {λ α=1,...,M }. For δ > 0, let us define the set of indices S δ = {α such that λ α ≥ δλ 0 }. The extension of a function f defined only at some sample points in Z ⊂Z to an arbitrary new point inZ is accomplished by the following projection step (depending on the purpose,Z can be either the ambient space y or the reduced one u): and the subsequent extension of P δ f : where •, • denotes the inner product, while Ψ j reads: The above is only the first step of a multi-scale scheme, where the function f is initially projected at a   Table 1. Comparison of reduced and detailed solution trajectories (with initial condition u = [0, 0] and 0 ≤ t ≤t = 1 × 10 −4 [s]) corresponding to several schemes implementing lifting and restriction operators (see text). δψ indicates the mean deviation between the reduced and detailed solution trajectory (in the reduced space): δψ =t −1 t 0 ψ det − ψ red dt, with ψ det , ψ red and • denoting the restricted detailed solution, the reduced solution and the Euclidean norm, respectively. Similarly, |δy α | is the mean deviation for species α (in the ambient space): |δy α | =t −1 t 0 y det α − y red α dt.

297
For completeness, in Fig. 13 we report the time series of the diffusion maps variables as obtained 298 by the methods 2 and 3 in the Table 1, as well as the restriction of the corresponding detailed solution.

299
Moreover, in Figs. 14 and 15 a comparison of the time series in the detailed space is reported as obtained 300 by reconstruction of the states in 9 from the reduced solutions in Fig. 13.           1[bar]). Two dimensional DMAP parameterization of 3810 points in terms of the two nontrivial leading eigenvectors ψ 2 and ψ 3 of the Markov matrix K. Colors represent mass fractions, while the black filled circle and the black diamond represent the fresh mixture condition and equilibrium state, respectively. Figure 11. Top: A sample detailed transient solution is shown in the plane ψ 2 − ψ 3 . Restriction is done by the Nyström method, while colors refer to the temperature (Kelvin) of the gas mixture. Bottom: time evolution of the absolute deviation between detailed and reduced solution trajectories (in the reduced space) ψ red − ψ det . Numbers in the legend correspond to the first six methods in Table 1. Figure 12. Illustrating a possible pathology. Samples (circles) are uniformly chosen in X, with Y = sin(X). Laplacian Pyramids are adopted for interpolation between samples with σ 0 = 10. Estimated values with the finest level l = 5, l = 9 and l = 13 are denoted by LP05, LP09 and LP13, respectively. At the latter level, the estimates of derivatives are no longer accurate.  Fig. 11 as obtained by Method 2 (top) and by Method 3 (bottom) (see Table 1). The initial condition in the diffusion maps space [0, 0] is first lifted into 9 and then relaxed towards the equilibrium point by the detailed kinetics (8) using the readily available Matlab solver ode45. The latter time series is afterwards restricted to the diffusion maps space and reported with a continuous line. Symbols denote the corresponding solution directly obtained in the reduced space by solving the system (13) by the same Matlab solver ode45. Figure 14. The initial condition in the diffusion maps space [0, 0] is first lifted into 9 and then relaxed towards the equilibrium point by the detailed kinetics (8) using the readily available Matlab solver ode45 (continuous line). Symbols report the corresponding time series as obtained by lifting the reduced solution at the top of Fig. 13 (i.e. method 2).  Figure 15. The initial condition in the diffusion maps space [0, 0] is first lifted into 9 and then relaxed towards the equilibrium point by the detailed kinetics (8) using the readily available Matlab solver ode45 (continuous line). ). Symbols report the corresponding time series as obtained by lifting the reduced solution at the bottom of Fig. 13 (i.e. method 3).