Benchmarking First-Principles Reaction Equilibrium Composition Prediction

The availability of thermochemical properties allows for the prediction of the equilibrium compositions of chemical reactions. The accurate prediction of these can be crucial for the design of new chemical synthesis routes. However, for new processes, these data are generally not completely available. A solution is the use of thermochemistry calculated from first-principles methods such as Density Functional Theory (DFT). Before this can be used reliably, it needs to be systematically benchmarked. Although various studies have examined the accuracy of DFT from an energetic point of view, few studies have considered its accuracy in predicting the temperature-dependent equilibrium composition. In this work, we collected 117 molecules for which experimental thermochemical data were available. From these, we constructed 2648 reactions. These experimentally constructed reactions were then benchmarked against DFT for 6 exchange–correlation functionals and 3 quality of basis sets. We show that, in reactions that do not show temperature dependence in the equilibrium composition below 1000 K, over 90% are predicted correctly. Temperature-dependent equilibrium compositions typically demonstrate correct qualitative behavior. Lastly, we show that the errors are equally caused by errors in the vibrational spectrum and the DFT electronic ground state energy.


Introduction
Determining the thermodynamic properties of chemical reactions is of extreme importance in chemical process design and heavily relies on our ability to measure or compute Gibbs free energies [1,2]. Whilst molecular Gibbs free energies, which enable equilibrium composition predictions, may be found in literature sources, such as the NIST Chemistry WebBook [3], for simple systems, even for many relatively standard molecules, these data are not available. For new processes, such as (Area Selective) Atomic Layer Deposition ((AS)ALD) [4,5] and Chemical Vapor Deposition (CVD) [6][7][8], with experimentally difficultto-measure and new, hypothetical molecules, the challenge is even bigger. In these cases, Density Functional Theory (DFT) [9,10] may provide an alternative source of information. Since DFT can give both the ground state electronic energy as well as the vibrational spectrum [11][12][13], it can be combined with statistical thermodynamics to yield all relevant and necessary thermodynamic quantities [14,15]. A quantity such as the equilibrium composition, which is of paramount importance in process design, is, however, rather complex; it combines many ingredients. Determining a systematic error bar is, hence, difficult, and a thorough benchmark is the only way out.
In the last 10-20 years, the number of studies that have benchmarked a specific property calculated from DFT or used a more advanced computational method has been increasing [13,[16][17][18][19][20][21][22][23][24][25][26][27][28][29]. However, very few of them have focused on the complex issue of predicting the equilibrium composition systematically. Many studies have focused on the enthalpies and energies of reactions. In such studies, the accuracy of the prediction of the enthalpies and energies of reactions typically lies at around 2-10 kcal mol −1 [30][31][32][33][34][35][36][37][38][39][40][41] for small to medium sized organic molecules and up to 40 kcal mol −1 [42,43] for inorganic ones. Although enthalpies are a key component of Gibbs free energy, the effect that errors in their prediction have on the equilibrium composition has not been directly tackled. Moreover, the few studies that have explicitly dealt with equilibrium compositions have tended to focus on specific systems and have used a narrow scope [44,45]. Thus, a general and broad study quantifying the accuracy of DFT-predicted reaction equilibria is needed to broadly enable the use of DFT to guide the optimization of process conditions in many domains of chemistry.
To develop a benchmark of DFT for predicting temperature-and pressure-dependent reaction equilibria, we first need a reliable experimental reference. In this work, we collected all molecules cataloged in the Computational Chemistry Comparison and Benchmark DataBase (CCCBDB) [46] with experimental Gibbs free energy curves and elements from the first two rows of the periodic table excluding noble gases, lithium, and magnesium and including bromine. In total, this selection encompassed 117 molecules. We then calculated the Gibbs free energy of these molecules with DFT using six different exchange-correlation functionals and three basis sets of increasing size. Using these data, we calculated the equilibrium compositions for all possible independent reactions with less than 5 reagents, amounting to a total of 2648 reactions. We calculated the equilibrium temperature-dependent composition for the DFT and experimental Gibbs energies individually and directly compared the results to assess how the errors from the DFT thermodynamics affect the equilibrium composition.

Errors for Constant Equilibrium Compositions
In our reaction set, most reactions have a fixed equilibrium composition (2164 reactions). As stated above, the signs of ∆G at the edges of the temperature range are enough to determine whether a correct composition is predicted. We present the results for this analysis in Table 1 with respect to experimental reports for the different functional and basis set combinations. Except for the calculations of the LDA/SVP combination, DFT was able to capture the correct compositions of the reactions in more than 90% of cases. Additionally, both the functional choice as well as the basis set choice had relatively small effects on this percentage, providing marginal improvements of 4% and 8%, respectively. As expected, we observed that the performance of TZVP and QZVPP was indistinguishable. More unexpectedly, we see that hardly any improvement beyond the PBE results was made.
The errors in the molecular Gibbs free energies calculated from DFT have two origins. One is the temperature-independent part originating from the electronic total energy and the zero point vibrational energy and the other is the temperature-dependent part originating from the occupation of vibrational, translational, and rotational modes.
An important part of the error in the temperature-dependent part is associated with the harmonic approximation used to describe the vibrational modes in the enthalpy and entropy terms of the Gibbs free energy. This approximation is only valid at sufficiently low temperatures when the asymmetry of the potential energy landscape for the vibrational energy states can be neglected. To quantify the magnitude of the errors introduced by the harmonic approximation, Table 2 presents the minimal, maximal, and mean errors of the temperature-dependent part of the Gibbs free energies of all molecules for all functionals and basis sets for two temperatures, 400 and 1500 K. The temperature-dependent term of the molecules arising from the vibrational contribution differs from its experimental counterpart by −312.72 to 297.78 kJ mol −1 at 1500 K, while the range is only −45.53 to 60.53 kJ mol −1 at 400 K. Although these numbers seem huge, ultimately, we are only interested in energy differences between products and reactions. The variation over the different basis sets and functionals is minimal. When carrying out the sign analysis approach only at lower-temperature windows [39] (300-400 K), we found an improvement of 2-4 percent points in terms of the percentage of correct predictions made. Considering only Gibbs energies between 300-400 K, the results are presented in Table 3. Table 3. Percentage of all reactions with a constant equilibrium composition correctly described by DFT in the range of 300 K-400 K. The remainder of the incorrect predictions for all reactions with temperatureindependent reactions from DFT are caused by ground state energy errors. We found that the set of reactions with an incorrect Gibbs reaction energy sign at a low temperature was associated with a ∆H 0K,reaction term of around 60-150 kJ mol −1 , which is about 10 times smaller than the average ∆H 0K,reaction , found to be above 600 kJ mol −1 . The exception to this was the PW-LDA functional, for which no relevant threshold value was found.

Errors for Temperature-Dependent Equilibrium Compositions
Next, we considered the reactions that showed a change of sign in the experimental ∆G within the 0-2000 K interval. This set made up 18% of all reactions. An analysis of the sign of ∆G, was performed for these reactions as well (see Tables 4 and 5). Again, we classified a reaction as being predicted correctly if the sign of ∆G was correctly predicted by the DFT results at the upper and lower temperature limits of the interval. In all cases, a significantly lower percentage of reactions was predicted correctly as compared to the fixed composition reactions. We observed an improvement of up to 18% if we only considered the lowtemperature case in the analysis (see Table 3). Once again, the errors at 1500 K were attributed mainly to the harmonic approximation. Table 4. Percentage of all reactions with the temperature-dependent equilibrium composition correctly described by DFT in the temperature range of 300 K-1500 K. As previously mentioned, an erroneous sign of ∆G alone does not provide a full quantitative picture of the accuracy of the DFT predictions for reactions with changing compositions. The results for the integrated extent of reaction errors are displayed in Figure 1. It is important to note that PBE0 had the lowest integrated error on average; however, specific molecules may display better performance levels with different functionals. High absolute values indicate a high degree of prediction inaccuracy. The sign reflects changes at lower (negative) or higher (positive) temperatures as compared to the experimental cases. Interestingly, all combinations of functionals and basis sets had mean values close to 0 K, meaning that the errors had little to no preference for over-or underestimating the transition temperature. Additionally, the exchange-correlation functionals had similar behaviors in terms of spread. This is in agreement with thermodynamic values reported for some solids [47]. The standard deviations of the integrated errors for all functionals were within the range of 355 to 488 K.

Functional
In most reactions, we found that prediction errors mostly consisted of the extent of reaction change taking place at a different temperature to the experimental one. We observed that cases with a high degree of error may be qualitatively correct, i.e., with the extent of reaction evolving in the proper direction with temperature, although the inflection point, corresponding to the change in sign of ∆G, occurs at a very different temperature to the experimental one. In such cases, if the predicted extent of reaction were to be shifted by the integrated error but in the opposite direction, predictions would become far more accurate. Whilst this was the most common behavior, some reactions displayed an incorrect extent of reaction, even at low temperatures. The deviation improved as the temperature increased (see Figure 2). In these cases, DFT predicted a fixed composition over the temperature range, although a change in composition took place experimentally. Separating the cases in which this happens is a rather difficult task, as the average enthalpy (which for the case of temperature-independent equilibrium compositions was very different, and distinguishing accurate and inaccurate predictions no longer allows such distinctions) lies very close to the value needed to properly describe the reactions.

Error Analysis
Understanding the origin of the errors is crucial if improvements are to be attempted. In order to obtain insight into the origin of the error, we selected the case of the PBE0 functional with the QZVPP basis set and studied the error sources in more detail. We found that 2 main sources of error can be isolated, namely 1.
Errors made on ∆H, whose dominant contribution comes from the DFT ground state energy.

2.
Errors in the transition temperature for the Gibbs free energy, which mostly come from harmonic approximation inaccuracies.
Replacing one term at a time with experimental values allows the contribution of each term's error to the integrated error to be determined. The results for this analysis are displayed in Figure 3. Interestingly, we observed similar improvements in the interquartile range of the data in both corrections, suggesting that the two sources of errors identified are of the same importance. Another observation is that most outliers in the DFT calculations dropped down to the behavior of the remaining points when the ground state Gibbs free energy was taken from experimental values.

Molecule Set Collection and Reaction Generation
To build a systematic benchmark, we used the following approach. First, we conducted a search of all gas phase molecules in the Computational Chemistry Comparison and Benchmark DataBase (CCCBD) [46] containing less than 9 atoms in total and excluded all radicals and ions as well as all carbon-based cyclic compounds. In this work, we limited ourselves to small molecules; however, the inclusion of larger molecules can be done. It is known that larger molecules (>10 atoms) typically have larger errors in enthalpy [42,43], which can very significantly alter the equilibrium composition. Second, we gathered the corresponding Gibbs free energies from the NIST Chemistry WebBook [3] and discarded all molecules for which no experimental Gibbs Energy values between 300 and 1500 K were available. This selection resulted in a set of 117 molecules. From this set, we constructed a list of stoichiometrically balanced reactions containing up to four different chemical species.
The reaction set contains 2648 reactions, and the distribution of the elemental composition of the reactions is shown in Figure 4. The full list of results and reactions can be found in the supplementary information.

Molecular Gibbs Free Energy Calculation
We constructed the atomic coordinates for every molecule in our reference set and carried out geometry optimization using Density Functional Theory (DFT) with a number of functionals and basis sets for all molecules. All DFT calculations were carried out using the Turbomole [49] package with the Exchange Correlation functionals PWLDA [50], B3-LYP [51,52], PBE [53], PBE0 [54], M06 [55], and TPSS [56], with the basis sets SVP, TZVP, and QZVPP from the def2 basis set library in Turbomole [57][58][59][60]. Whilst many Exchange Correlation (XC) functionals are available for this purpose [36,61,62], we focused on a small range of functionals without dispersion-correction (since we focused on small molecules) that spanned a reasonable part of the spectrum of computationally cheap DFT methods. We used an energy change convergence criterion of 1 × 10 −6 Ha for selfconsistent field convergence, an m5 grid size for the exchange-correlation integration [63], and a 1 × 10 −5 Ha convergence for the convergence of the ground state. We converged the SCF calculations as well as the geometry optimization with a convergence criterion of 1 × 10 −6 Ha. The geometry optimization scheme was used with a maximum of 0.3 Å coordinate changes per iteration. Further tightening of these criteria showed no effect on the equilibrium compositions in the sampled reactions. We stress that computationally cheap is of the essence, since chemical systems of reactions can quickly grow into many species whose properties need to be considered.
In the next step, we calculated the vibrational frequencies of all molecules in the harmonic approximation. From these, we obtained the vibrational partition function q v . The rotational and translational partition functions, q r and q t , were obtained in the ideal gas approximation [14,15]. Finally, the electronic contribution q e originating from the spin multiplicity was included. In full detail, the molecular Gibbs free energy G molecular is given by where E 0 is the electronic ground state energy of the molecule, E ZPE is the vibrational zero point energy of the molecule, P is the reference pressure (1 bar), V ideal is the ideal volume of a single molecule, T is the temperature, and Q is the full molecular partition function: The vibrational contribution q v is given by where k B is the Boltzmann constant, and ω j are the wave number of mode j calculated from the harmonic approximation. The translational part becomes with m being the mass of the molecule, h being the Plank constant, and V being the molecular volume, which can be expressed as follows under the ideal gas approximation: where P is the reference pressure of the system (chosen as 1 atmosphere). The rotational part becomes where Θ A , Θ B , and Θ C are rotational constants given by the following formula: where I i is the moment of inertia in coordinate i Finally, the electronic part is given by where g j is the degeneracy of the electronic mode j and ω j is the energy of mode j calculated from DFT.
From the complete molecular partition function Q, we calculated the molecular Gibbs free energy G molecular according to Equation (1) at equidistant temperature points in the range of 0 K-2000 K and fit a 6th order polynomial. Storage of the polynomial coefficient allowed us to readily calculate Gibbs free energies for any temperature within that range. In all cases, the fitting error was recorded and had a Root Mean Square Error (RMSE) value of 0.048 kJ mol −1 , and the highest deviation computed was 0.28 kJ mol −1 .

Calculation of the Equilibrium Composition
The equilibrium composition for any given reaction system is reached when the Gibbs free energy reaches a minimum. In order to obtain it, we used the Sequential Least-Squares Quadratic Programming (SLSQP) [64] algorithm to evolve the composition at a fixed pressure and temperature from a stoichiometric amount of reactants until a convergence criterion was reached (1 × 10 −10 % of the Gibbs energy of the initial composition). We then repeated this calculation in equidistant steps spanning 300-1500 K at 1 atmosphere with entropy mixing conducted using ideal mixing laws for gases, as shown in Equations (9) and (10)).
where i is the index for all species in the reaction system, x i is the molar fraction of the component, N is the total amount of moles in the reaction system, and R is the gas constant. Minimization was performed under particle number conservation conditions.

Error Calculation
Finally, we needed a clear way to quantify the accuracy of the equilibrium composition prediction. Among the studied reactions we observed two distinct patterns, reactions with one fixed composition over the entire temperature range and those that included a transition. Due to mixing entropy, this distinction is not completely black and white; a partial transition can also occur. To make the distinction strict, we considered the sign of the Gibbs free energy of the reaction at 0 and 2000 K. If it was the same, we classified the reaction as a fixed composition type and if it changed, we classified it as a changing type. For the first class, the DFT-based prediction was considered either correct or incorrect. For the second class, we needed a method to quantify the error. In total, we obtained 580 reactions that included a change in the considered temperature interval.
A change in equilibrium means that the Gibbs free energy of the reaction changes signs at some point in the temperature range. The error in the calculated value of this temperature could be taken as a quantitative measure of the quality of the prediction. An example of a reaction in which this occurs is the BCl + BCl 3 → B 2 Cl 4 reaction, as shown in Figure 5. For this system, the DFT results predicted the transition temperature at 1300 K, whereas the experimental data placed it at 1600 K. Despite the 300 K difference in temperature, the qualitative comparison does not seem too different. To provide an alternative measure, we also considered the integrated difference in the extent of reaction (see Figure 6). This analysis was performed for all 580 reactions that showed a composition change.

Conclusions
In this work, we showed that thermodynamic data obtained from DFT calculations can be used to correctly predict equilibrium compositions from temperatures ranging from 300 to 1500 K in a broad variety of chemical reactions. Predictions for which no change in equilibrium composition is present are particularly accurate, whilst predictions that involve an equilibrium composition change are more difficult to capture but still provide a qualitative picture. This trend can still hold true for complex reaction systems (i.e., systems with multiple extents of reactions) since the qualitative nature of the predictions is likely to hold as long as ∆H 0K,reaction is above the threshold of 50-150 kJ mol −1 . Whether these trends still hold for molecules beyond 9 atoms requires further study.
In the set of reactions with a fixed equilibrium composition, the functionals TPSS and PBE0 at the QZVPP basis set performed the best, achieving correct predictions in almost all cases (92%), with a significant portion of the incorrect results occurring at high temperatures at which deviations from the harmonic approximation are known to dominate. Importantly, we observed that the difference in accuracy between the TZVP and QZVPP results was very small. The significant additional computational burden of the quadruple zeta basis set is, hence, hardly justifiable. Similarly, we observed a significant difference between the results obtained with the LDA functional and all others, but no real improvement was made beyond this. The incorrect reactions were all found to have a Gibbs free energy of around 50 kJ mol −1 , i.e., 6 times lower than the average ∆G, which could be a useful threshold to consider when carrying out similar calculations.
In the set of reactions with temperature-dependent compositions, the initial error calculations showed that the sign of ∆G for this system can be wrong at the limits of the selected temperature ranges in 35-40% of all cases. In up to half of the reactions for which an incorrect sign was predicted, ∆G was incorrect at high temperatures. Despite this, qualitative information may still be extracted. Further analysis by measuring the integrated extent of reaction error showed that the composition error consists of a shift in the reaction of the extent with an RMSE of 200-300 K. These deviations were found to arise from a combination of errors in the ground state energy of the species in the reaction as well as errors in vibrational frequencies. It was shown that whilst both corrections are significant, both must be corrected in order to significantly improve the results. Data Availability Statement: All the data produced for this work can be found in the supplementary information.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: