Thermodynamics of Potential CHO Metabolites in a Reducing Environment

How did metabolism arise and evolve? What chemical compounds might be suitable to support and sustain a proto-metabolism before the advent of more complex co-factors? We explore these questions by using first-principles quantum chemistry to calculate the free energies of CHO compounds in aqueous solution, allowing us to probe the thermodynamics of core extant cycles and their closely related chemical cousins. By framing our analysis in terms of the simplest feasible cycle and its permutations, we analyze potentially favorable thermodynamic cycles for CO2 fixation with H2 as a reductant. We find that paying attention to redox states illuminates which reactions are endergonic or exergonic. Our results highlight the role of acetate in proto-metabolic cycles, and its connection to other prebiotic molecules such as glyoxalate, glycolaldehyde, and glycolic acid.


Introduction
The extant metabolism of living systems is complex. However, at its core [1], represented by the tricarboxylic acid (TCA) cycle and its chemical cousins, the metabolites consist of only a small subset of molecules containing the elements carbon, hydrogen, and oxygen. Present metabolism is highly regulated. Specific biochemical reactions are catalyzed by specialized enzymes with help from a variety of other co-factor molecules. However, at the dawn of life's origin, before the complex machinery of biochemical machinery evolved to support and sustain metabolism, what might a proto-metabolism look like?
As pointed out by Pross [2] in this Special Issue, "all material systems are driven towards more persistent forms." In the context of which specific molecules will persist (in equilibrium or at steady state) and thereby contribute to the small subset utilized by proto-metabolism, both thermodynamics and kinetics will play a role. Our present study will only focus quantitatively on the thermodynamics, while making qualitative reference to kinetics. This is partly due to limitations in our present methodology (see Methods section), partly to keep the problem at hand tractable, but also partly because we think a survey of the thermodynamics is an interesting tale in its own right.
Autocatalysis lies at the heart of persistence. If a molecule can catalyze its own production faster than its competitors, it will persist (maintaining a non-zero and possibly significant concentration as a function of time)-until its "food" runs out or a parasitic reaction shunts it out of the main network. For a broad overview of autocatalytic networks and their role at life's origin, we recommend the review by Hordijk and Steel [3] in this journal. Our present study focuses on just one class of reaction cycles: those that produce a net two-carbon molecule from one-carbon substrates. Extant life on our planet ultimately depends on fixing CO 2 as its carbon source to build biomass. While proto-life on the early Earth may have had a variety of carbon sources, the ability to incorporate C 1 molecules into larger structures remains fundamental for a self-sustaining system to emerge [4]. Thus, incorporating C 1 + C 1 → C 2 into a sustaining cycle is fundamental.
In the absence of complex biomolecular enzymes, high-energy activation barriers are unlikely to be traversed at temperatures conducive to life. We can narrow down potential cycles by looking at thermodynamics: feasible chemistry will exclude overly endergonic reactions, while an overly exergonic reaction mid-cycle would require a subsequent endergonic step, thereby narrowing the possibilities further. We first examine the core TCA cycles and its chemical cousins building on work by Braakman and Smith [1], focusing on the reverse (i.e., reductive) direction in the absence of any enzymes or co-factors. (For recent reviews of non-enzymatic metabolic reactions including the potential role of a reverse TCA cycle at the origin of life, see Muchowska et al. [5] and Tran et al. [6]; for a review on the energetics of biomolecular synthesis, see Amend et al. [7]) We then expand the scope of potential metabolites beyond those used by extant life to a wider range of CHO compounds. We are not the first to examine this wider scope. Morowitz et al. suggested a set of 153 compounds ranging from C 1 to C 6 (70 in the C 1 to C 4 range) based on simple rules of thumb [8]. Meringer and Cleaves extended and refined this set using structure-generation methods from existing molecular databases to examine if the metabolites of the reverse TCA cycle are "optimal" [9]. (Their conclusion: maybe not.) Zubarev et al. examined the thermodynamics of a potential reverse TCA "supernetwork" (175 molecules, 444 reactions); they use computationally faster (but less accurate) semi-empirical methods to calculate the Gibbs free energies; and they conclude that there exist families of TCA-like reactions with similar energetic profiles [10].
Our present research complements these earlier "bird's eye view" studies by focusing on the details of the smaller C 1 to C 4 species. We calculate the (aqueous) Gibbs free energy of each species using first-principles quantum chemistry. Our method is more computationally-intensive (but not terribly so) and matches well with experimental data where available. To map the thermodynamic landscape, our set of 211 C 1 to C 4 compounds is much wider than those generated by the "Morowitz rules" [8] because we include highly reduced compounds that are unlikely to be metabolites. Our present study also limits the oxidizing and reducing agents to CO 2 and H 2 , respectively. We do this to establish baseline data for future work.
While our data allow for a wide range of analyses, this paper will pick out a few key examples to illustrate how one might build a proto-metabolic cycle, why paying attention to redox states is important, and why particular molecules may be keystone species, thermodynamically-speaking. In particular, we will (1) explain why the C 1 + C 1 → C 2 reaction must be incorporated into a cycle, (2) examine the close relationship between carbonyl compounds as potential metabolites, (3) suggest reasons for the centrality of acetate in core metabolic cycles, and (4) discuss specific reactions and molecules that could participate in proto-metabolic cycles in the absence of highly specific catalysts. In no way do we discount the importance of other compounds beyond the limited set we have calculated (compounds containing nitrogen, sulfur and phosphate; metal ions/clusters) that can act as potential catalysts and cofactors; nor will we discuss the important question of flows in non-equilibrium thermodynamic systems. Ongoing research in our laboratory explores these extensions but they are beyond the scope of the present work.

Materials and Methods
To construct our thermodynamic maps, we have established a protocol to calculate the relative aqueous free energy of each molecule using quantum chemistry. This protocol has been described in detail in our previous papers (most recently in [11]) and shows good agreement with available experimental results for CHO systems [12][13][14]. Herein, we briefly summarize the protocol and point out some of its limitations, reproducing some portions of text in our previous work [11] for clarity and reading ease.
The structure of each molecule is optimized and its electronic energy calculated at the B3LYP [15][16][17][18] flavor of density functional theory with the 6-311G** basis set. To maximize the probability of finding global minima, multiple conformers are generated using molecular mechanics (OPLS force field with water as a solvent) [19]. The optimized structures are embedded in a Poisson-Boltzmann continuum to calculate the aqueous solvation contribution to the free energy. While this does not provide a specific concentration, it assumes a dilute solution such that the electrostatic field generated by a neighboring solute molecule is effectively screened by the water solvent. One can consider all solutes to have the same relative concentrations in our calculations.
Zero-point energy corrections are included based on the calculated analytical Hessian. We apply the standard temperature-dependent enthalpy correction term (for 298.15 K) from statistical mechanics by assuming translational and rotational corrections are a constant times kT, and that low-frequency vibrational modes generally cancel out when calculating enthalpy differences. So far, this is standard fare.
Entropic corrections in aqueous solution are more problematic [20][21][22]. Changes in free energy terms for translation and rotation are poorly defined in solution due to restricted complex motion, particularly as the size of the molecule increases (thus increasing its conformational entropy). Free energy corrections come from two different sources: thermal corrections and implicit solvent. Neither of these parameters is easily separable, nor do they constitute all the required parts of the free energy. We follow the approach of Deubel and Lau [23], assigning the solvation entropy of each species as half its gas-phase entropy (calculated using standard statistical mechanics approximations similar to the enthalpy calculations described above), based on proposals by Wertz [24] and Abraham [25] that upon dissolving in water, molecules lose a constant fraction (~0.5) of their entropy.
When put to the test by first calculating the equilibrium concentrations in a selfoligomerizing solution of 1 M glycolaldehyde at 298 K, our protocol fared very well compared to subsequent NMR measurements [14]. That being said, our protocol does show systematic errors when calculating barriers, but since quantitative kinetics is not part of the present study, this is not an issue. When extended to include nitrogen-containing species, our protocol still does well but systematic errors do crop up for specific functional groups [26]. Going to a higher level of theory and/or including dispersion corrections does not improve the results (see Table S1); this may seem surprising but quantum chemistry is about error cancellation, and our protocol (with its foibles) seems to work well, at least for the compounds in this study.
The relative aqueous free energies, designated G r , are calculated with respect to three reference molecules: H 2 , H 2 O, and H 2 CO 3 . We use H 2 CO 3 instead of CO 2 because it allows a direct comparison with experimentally-derived thermodynamic data from Alberty [27]. Thus, if G is the free energy calculated from quantum chemistry, then the formation reaction of glycolaldehyde is 2 H 2 CO 3 + 4 H 2 → C 2 H 4 O 2 + H 2 O, and G r (glycolaldehyde) . In this study, free energy corrections for concentration differentials among species (to obtain chemical potentials) are not included.
We have calculated all compounds in their neutral form. Thus, acetic acid is calculated rather than acetate, because this gives much better results in comparison to experimental data as discussed in the Results section, where we compare our calculations of both the neutral and anionic forms for the TCA cycle. (Additionally, our protocol was established and validated only for neutral compounds.) Additional comparisons of neutral versus anions for other cycles are provided in Tables S2-S4. The protocol used for calculating anions is similar to the neutral, except for the addition of diffuse functions to the basis set of anions, a standard procedure in quantum chemical calculations.
Our set of calculated compounds includes all CHO molecules from C 1 to C 4 with carbon-carbon backbones (except alkynes). Select compounds with oxygen in the backbone (ethers, esters, anhydrides) are included, although non-exhaustively (they are less stable than their carbon-backbone isomers). For each unique molecular species, we report only the free energy of the most stable conformer. In a small number of cases, this may be the enol or the hydrate. For compounds with one chiral center, the biologically relevant one was calculated; for two chiral centers, the most stable diastereomer is reported. The list of compounds explicitly mentioned in the Results section and their corresponding G r values can be found in Appendix A. The full list is provided in Table S5). A graphical comparison of our calculated G r values to experimentally-derived ones is also shown in Appendix A.
Assigning H 2 , H 2 O, and H 2 CO 3 as reference states allows us to quickly and easily visualize a map of the energy landscape for the myriad reactions that can take place. For a chemical reaction, the difference in free energies will be designated ∆G, calculated as G r (products)-G r (reactants). For example, ∆G for the conversion of acetic acid into pyruvic acid is simply G r (pyruvic acid)-G r (acetic acid), since the reference state molecules (H 2 , H 2 O, H 2 CO 3 ) by definition have G r values of zero.

Results
Our results and discussion are organized as follows: (1) First, we provide further validation of our methods by comparing our calculated ∆G values against experimental data for the TCA cycle. (2) Next, we examine the free energy profiles of four core CO 2 -fixation cycles in Braakman and Smith [1]. (3) Then, we discuss why embedding C 1 + C 1 → C 2 in a cycle is fundamental using the example of formaldehyde oligomerization. (4) This leads naturally to our exploring a wider scope of reactions and cycles that might have played a role in proto-metabolic systems. (5) Finally, in the Discussion section, we take a bird's eye view of the overall thermodynamic picture with an eye on redox states.

TCA Cycle: Comparison to Experimentally-Derived Data
Using experimentally-measured equilibrium constants, Alberty derived standard free energies of formation for a range of biochemical compounds [27]. Not surprisingly, there is a relatively good match when comparing the Gibbs free energies of reaction from Alberty's Table 4.10 (pH 7, 298.15 K, 0.25 M ionic strength, 1 M solute) to data provided from a standard biochemistry textbook [28], for the TCA cycle as shown in Table 1. The main discrepancy is Step 1, where Alberty has a more exergonic reaction by 3.2 kcal/mol. As a result, Alberty's net cycle is overall 3 kcal/mol more exergonic than the textbook value. Note that Alberty assigns ∆G = 0.0 in Step 6 to match the experimental data when attempts to derive free energies for the enzyme-bound FAD proved unsatisfactory. How do our quantum calculations compare? We do not explicitly calculate the cofactors CoA, NAD + /NADH and ADP/ATP. Therefore, to correct for their inclusion, we recalculated Alberty's ∆G values for each reaction in the absence of these co-factors to find empirical corrections, which are: −7 kcal/mol for AcCoA → CoA + acetate; −9 kcal/mol for NAD + → NADH; and +7 kcal/mol for ADP + P i → ATP. Both Alberty and the textbook split Step 5 into two separate reactions, but since we do not explicitly calculate succinyl-CoA as separate from succinate, we combined these into a single step.
Our results show some differences compared to the experimental data. We calculate larger energy changes for the hydration/dehydration sequence in Steps 2 and 3. For the NAD + /NADH couple, our Step 8 is slightly more endergonic, while our Steps 4 and 5 are more exergonic. Opposite to Alberty, our Step 1 ∆G value and net cycle are both less exergonic than the textbook data. However, we are more interested in the free energy profiles in the absence of these co-factors, and to that data we now turn. Table 2 shows ∆G calculated from five sources of data:  The reaction cycle has nine steps because we explicitly include oxalosuccinate as a distinct species. For the metabolite names, we use the abbreviations of Braakman and Smith in their depiction of the four core-cycles (to be discussed in the next section); nameabbreviation connections are found in Appendix A. H 2 is the reductant in this system.
Not surprisingly, the Alberty and eQuilibrator data are very similar with the exception of OXS, where their ∆ f G i 0 values differ significantly. This leads to the 7-8 kcal/mol uphill and downhill differences in Steps 4 and 5. However, the overall net cycles are similar at +44 kcal/mol. (With no co-factors the oxidative TCA cycle is rather endergonic!) The most endergonic step is the oxidation of SUC to FUM, and in the absence of the enzyme-bound-FAD complex, the raw cost of breaking two C-H bonds to form a π bond between the carbons and one H-H bond is expected to be approximately +25 kcal/mol.
Our quantum calculations for the neutral molecules are not too different (they match Alberty slightly better than eQuilbrator) and the overall cycle is +45 kcal/mol. This gives us confidence that our calculated ∆G values match reasonably well with experimental-derived data. In contrast, if we calculate all these compounds as anions (carboxylates) instead of the neutral COOH groups, there are significant differences in more than half the steps, although the overall cycle is not too different at +50 kcal/mol. (Using a higher level of theory does not improve the result; see Table S1.) Experimentally-derived data are limited to compounds of biochemical interest. However, our goal is to expand the set of compounds to explore closely related compounds not utilized by extant biochemistry. Hence, we also used the group additivity method of Jankowski et al. [31] to calculate the reactions in this cycle. (Note that eQuilibrator also utilizes some group additivity in its calculations [30].) In this case, the Jankowski scheme does poorly when compared to Alberty. The overall cycle is +15 kcal/mol primarily due to underestimating the endergonicity of the oxidation steps (4, 7 and 9).
For the rest of the Results section, we will not be discussing the quantum calculations using anions since their ∆G values do not match well with experiment. (These, along with additional Jankowski and eQuilibrator data are included in Supplementary Materials for the interested reader.) Additionally, because the Alberty dataset is limited, we will compare our quantum calculations to eQuilibrator data where available.

Thermodynamics of Core Cycles for CO 2 Fixation
Since the oxidative TCA cycle is net endergonic, this provides an opportunity for exergonic CO 2 fixation by running the cycle in reverse. Figure 1 shows the four core cycles: TCA (in black), dicarboxylate/4-hydroxybutyrate (DC/4HB in red), 3-hydroxypropionate bicycle (3HP, in blue), and 3HP/4HB (in green).  does poorly when compared to Alberty. The overall cycle is +15 kcal/mol primarily due to underestimating the endergonicity of the oxidation steps (4, 7 and 9). For the rest of the Results section, we will not be discussing the quantum calculations using anions since their ΔG values do not match well with experiment. (These, along with additional Jankowski and eQuilibrator data are included in Supplementary Materials for the interested reader.) Additionally, because the Alberty dataset is limited, we will compare our quantum calculations to eQuilibrator data where available.

Thermodynamics of Core Cycles for CO2 Fixation
Since the oxidative TCA cycle is net endergonic, this provides an opportunity for exergonic CO2 fixation by running the cycle in reverse. Figure 1 shows the four core cycles: TCA (in black), dicarboxylate/4-hydroxybutyrate (DC/4HB in red), 3-hydroxypropionate bicycle (3HP, in blue), and 3HP/4HB (in green).  As described in the Methods section, all Gr values are with respect to the reference molecules H2, H2O and H2CO3. This allows us to quickly calculate ΔG for any reaction of interest in these cycles. For example, oxaloacetatic acid (OXA) in the reductive TCA cycle is first reduced to malic acid (MAL). (We use the "acid" names rather than the anion names because our calculations are for the neutral form.) ΔG for this reaction is −48.14 + 33.06 = −15.08 kcal/mol. This reduction reaction is the exact opposite of its oxidative counterpart (Step 9 in Table 2). The overall cycle is exergonic with a net change of −45.03 As described in the Methods section, all G r values are with respect to the reference molecules H 2 , H 2 O and H 2 CO 3 . This allows us to quickly calculate ∆G for any reaction of interest in these cycles. For example, oxaloacetatic acid (OXA) in the reductive TCA cycle is first reduced to malic acid (MAL). (We use the "acid" names rather than the anion names because our calculations are for the neutral form.) ∆G for this reaction is −48.14 + 33.06 = −15.08 kcal/mol. This reduction reaction is the exact opposite of its oxidative counterpart (Step 9 in Table 2). The overall cycle is exergonic with a net change of −45.03 kcal/mol (the exact opposite of +45.03 kcal/mol in Table 2). This value is also equal to the net chemical reaction of the cycle which converts two equivalents of CO 2 into acetic acid. 2 Equation (1) is also the net reaction for the DC/4HB and 3HP/4HB cycles, and is a key example of incorporating the C 1 + C 1 → C 2 reaction into a cycle.
In Figure 1, the G r value for ACE is −45.03 kcal/mol since Equation (1) represents the formation of ACE with all other molecules being reference states. For eQuilibrator data, the equivalent G r value for acetate is −44.63 kcal/mol (see Appendix A), which is exactly what you would expect given the net cycle in the oxidative direction from Table 2 is +44.63 kcal/mol.

The Reductive TCA Cycle
Let us examine the free energy changes for each step along the reductive TCA cycle by following the black lines in Figure 2. The reaction sequence is as follows: 1.
Life 2021, 11, x FOR PEER REVIEW 7 of 24 kcal/mol (the exact opposite of +45.03 kcal/mol in Table 2). This value is also equal to the net chemical reaction of the cycle which converts two equivalents of CO2 into acetic acid.
2 H2CO3 + 4 H2 → ACE + 4 H2O (1) Equation (1) is also the net reaction for the DC/4HB and 3HP/4HB cycles, and is a key example of incorporating the C1 + C1 → C2 reaction into a cycle. In Figure 1, the Gr value for ACE is −45.03 kcal/mol since Equation (1) represents the formation of ACE with all other molecules being reference states. For eQuilibrator data, the equivalent Gr value for acetate is −44.63 kcal/mol (see Appendix A), which is exactly what you would expect given the net cycle in the oxidative direction from Table 2 is +44.63 kcal/mol.
In contrast, the reduction steps (1, 3 and 6) are significantly downhill (ΔG of −15, −17 and −27 kcal/mol for OXA → MAL, OXS → ISC and FUM → SUC, respectively), and are The two mildly endergonic steps (2 and 7) are dehydration reactions. The two main endergonic steps (4 and 5) both involve oxidation because CO 2 (with a +4 oxidation state of carbon) is added. The first of these also adds an equivalent of the reductant H 2 , i.e., a net change in oxidation of 2 units from SUC (+2) to AKG (+4). The second does not include H 2 and changes the oxidation state by 4 units: AKG (+4) to OXS (+8). Each step is 8 kcal/mol uphill, and both steps occur in succession. Note that the first addition involves carboxylation at the terminus, while the second adds a branch to the backbone.
In contrast, the reduction steps (1, 3 and 6) are significantly downhill (∆G of −15, −17 and −27 kcal/mol for OXA → MAL, OXS → ISC and FUM → SUC, respectively), and are the reason why the net cycle is overall exergonic. The final step splitting CIT (+6) to produce ACE (0) and regenerate OXA (+6) is mildly exergonic; OXA has the same overall oxidation state as CIT.
Could the reductive TCA cycle play a key role in prebiotic proto-metabolic systems? It is unclear. Having two successive endergonic reactions would require an oxidizing co-factor strong enough to compensate for the unfavorable uphill~16 kcal/mol. The question of whether OXA could play the key role as the "recycled" molecule also poses problems given its kinetic instability to β-decarboxylation (and forming PYR). It is also unclear that the specific larger C 5 and C 6 compounds of the reductive TCA would be present in sufficient quantities to participate in a sustaining primordial cycle.

The 3HP/4HB Cycle
The 3HP/4HB cycle (green lines in Figures 1 and 2) mirrors the TCA cycle but only involves the smaller C 2 to C 4 metabolites. The net reaction is Equation (1) and the free energy change of the overall cycle is −45.03 kcal/mol. The "recycled" molecule is ACE itself. Let us consider the reaction steps in detail.
The first CO 2 addition takes place in the very first step at the α-carbon, converting ACE to MLN. This makes sense; highly oxidized CO 2 has a very positive carbon so we expect it to add to the more negative (or least positive) carbon on ACE, in this case the methyl group. For this oxidation step, ∆G = +7 kcal/mol. Note, however, that in the absence of catalysts, the addition of CO 2 will have a high barrier.
The second step is a reduction of carboxylic acid to aldehyde to form MSA, and unlike reductions we have seen in the TCA cycle, it is energetically neutral. We will expand on this feature later, but for now, note that it involves a condensation (releasing H 2 O) in concert with the addition of H 2 . In contrast, the reactions we examined in the TCA cycle only involved H 2 addition (ketone to alcohol, or the very exergonic alkene to alkane).
The next several steps MSA → 3HP → ACR → PRP are similar to the OXA → MAL → FUM → SUC sequence. However, when PRP is oxidized by an equivalent of CO 2 , branched MEM is produced rather than linear SUC. Once again, it makes sense to add CO 2 to the least positive carbon, in this case the α-carbon of PRP (in terms of alternant "plus" and "minus" carbons in the chain of a polar compound). Interestingly, this reaction is only mildly endergonic (∆G = +3 kcal/mol), noticeably less than previous CO 2 incorporations we have considered thus far. We will see this again in several cases when CO 2 is added to a carbon that is not bonded to oxygen (e.g., a methyl or methylene group).
Although exergonic, the isomerization of MEM to SUC whereby the methyl branch insinuates itself into the chain is unlikely to occur in a prebiotic milieu (not a problem for the evolved enzyme methylmalonyl-CoA mutase); and likely has a high barrier (uncatalyzed). Moving along, SUC → SSA → 4HB mirrors MLN → MSA → 3HP, although SUC → SSA is mildly exergonic instead of being energy neutral. You might expect dehydration of 4HB to be mildly endergonic, and it would be indeed if but-3-enoic acid was formed (see Section 3.4), but the formation of CRT also includes a double bond shift to the more stable isomer. Rehydration to 3HB then becomes mildly endergonic.
However, because 4HB/CRT/3HB have oxidation state −2, the cycle now requires an alcohol to ketone oxidation, uphill~8 kcal/mol. Once zero-oxidation state AcACE is formed, it can split apart to two ACE molecules in a reaction resembling a reverse Claisen condensation; this reaction that recycles ACE is downhill (∆G = −13 kcal/mol).
How does the 3HB/4HB cycle compare to TCA? It splits up the endergonic steps involving oxidative CO 2 addition, the second being only mildly endergonic, which might be advantageous. However, the cycle over-reduces, thus requiring a third endergonic oxidation step before splitting and recycling ACE. Furthermore, converting branched MEM to SUC, although thermodynamically favorable, is likely to be kinetically inaccessible without highly specific enzymes or co-factors.
Compared to eQuilibrator, our G r values (in Figure 1) . For 4HB, our calculated G r is −83.49, a difference of over 5 kcal/mol. For 3HB, our calculated G r is −86.46 which is less negative than eQuilibrator.

The DC/4HB Cycle
The DC/4HB cycle (red lines in Figures 1 and 2) utilizes the first half of the TCA cycle (OXA to SUC) and the second half of the 3HP/4HB cycle (SUC to AcACE). It also only involves the smaller C 2 to C 4 metabolites, and once again the net reaction is the same as Equation (1) and the overall cycle is −45.03 kcal/mol. However, it begins with ACE (and not OXA), and thus the recycled molecule is ACE itself.
The first two steps frontload the uphill oxidative incorporation of CO 2 equivalents. ACE → PYR is analogous to SUC → AKG, in that "CO 2 + H 2 " are added to the terminal carboxylic acid, and the oxidation state changes by +2. The second step adds CO 2 to the least positive carbon, in this case the methyl of PYR (+2), to form OXA (+6) with an oxidation state change of +4 mirroring the AKG → OXS reaction. However, unlike SUC → AKG → OXS which is 16 kcal/mol uphill, ACE → PYR → OXA is only 12 kcal/mol uphill because the second step is less endergonic (∆G = +5 kcal/mol). While this seems more feasible, the question is whether the first carboxylation of ACE to PYR is more or less favorable than to MLN. Thermodynamically, there is little difference (MLN is a mere 0.6 kcal/mol more stable), but kinetically things could be quite different depending on what primitive catalysts or co-factors might be present and whether H 2 can easily be incorporated as a reductant in this first step. Answering this question is beyond the scope of this article (we will tackle the kinetics in future work). As previously mentioned, CO 2 addition will have high barriers in the absence of catalysts, although our present work focuses quantitatively only on the thermodynamics.
The rest of the cycle was discussed earlier since it overlaps with the TCA and 3HP/4HB cycles. Once OXA is formed, it is mostly downhill all the way to 3HB except that overreduction requires an uphill oxidation to AcACE prior to splitting apart into two ACE. Our calculated G r for PYR (−37.55) is marginally less negative than eQuilibrator's value of −38.87 kcal/mol.

The 3HP Bicycle
Unlike the previous three cycles, the 3HP bicycle (blue lines in Figures 1 and 2) does not have the same net reaction. The first half of the cycle from ACE to PRP overlaps with the 3HP/4HB cycle, but then the path splits into two: (1) PRP can add GLX to access C 5 compounds (MML, MSC, CTM) before splitting into ACE and PYR. (2) PRP can add CO 2 to access the C 4 compounds (MEM, SUC, FUM, MAL) before splitting into ACE and recovering GLX. Thus, the net reaction as shown in Equation (2) 22). This means that when employed as a reactant, it can drive a downhill reaction to form thermodynamically more stable products. However, recycling GLX comes at a cost. The killer step, in this case, is the oxidation of SUC to FUM (∆G = +27.12 kcal/mol), part of the sequence PRP → MEM → SUC → FUM → MAL → ACE + GLX which is oxidative and endothermic +27.65 kcal/mol. This is balanced by the downhill sequence starting from ACE and involving the C 5 compounds (∆G = −37.77 kcal/mol), recycling ACE while converting GLX (+4) to PYR (+2), a net reductive reaction.
In the left half of Figure 2, there is a small gap separating the blue bars from the other bars, signifying the presence of GLX contributing G r = + 0.22 kcal/mol to the relative free energy. We will revisit the potential wider role of GLX in Section 3.4.

Back to Basics: Incorporating C 1 + C 1 → C 2 into a Cycle
While Stanley Miller's landmark experiment [32] demonstrated that the synthesis of amino acids was possible in a highly reducing atmosphere [33], the present scientific majority view as summarized by Kasting [34] is that CO 2 rather than CH 4 was the dominant C 1 building block available. Whatever the prebiotic milieu may have been, even if a range of carbon-containing compounds was available, the growth and sustenance of a protometabolic cycle requires "food" molecules-likely to be C 1 , as this would deplete specific larger molecules in the cycle (via hydrolysis or other parasitic side-reactions) unless they are replenished by a C 1 source.
The problem, however, is that the direct C 1 + C 1 → C 2 , even if thermodynamically feasible, is kinetically challenging. In CO 2 (or HCO 2 H, CH 2 O, CH 3 OH), carbon carries a partial positive charge because of the more electronegative oxygen. Making a new C-C bond between two such entities requires traversing high activation barriers. In the absence of co-factors or catalysts that could significantly lower the barrier (or using prohibitively high reaction temperatures), regular and constant production of C 2 (or larger) molecules would be very slow. While the carbon of CH 4 carries a small partial negative charge, CH 4 is generally unreactive to forming new C-C bonds (in the absence of metal-containing catalysts).
CO is an interesting case; its net dipole is close to zero although its carbon carries a small negative partial charge; but CO is much less stable (thermodynamically) than CO 2 and unlikely to exist in large quantities over a long period of time. It could, however, play an important role in situ, generated transiently from CO 2 in the presence of H 2 . Using our reference states, this reaction as shown in Equation (3) is: From our quantum calculations, G r (CO) = +3.33 kcal/mol, and thus this reaction is endergonic. (In contrast, formation of the other C 1 compounds from CO 2 and H 2 is exergonic, see the Discussion section.) This reaction could play a role in CO 2 -fixation steps, as we have seen in the previous section, where several reactions involved the formal addition of "CO 2 + H 2 " such as ACE (0) → PYR (+2).
The cycles we examined in the previous section all involve adding CO 2 to a species that is C 2 or larger. Why is this kinetically more feasible? A polar organic molecule, where one of the carbons carries a (significant) partial positive charge usually also contains neighboring carbons that are less positive or even have a (slight) partial negative charge. Addition of CO 2 can be "directed" towards the less positive carbon of this larger molecule with a lower barrier-an umpolung-like "strategy" if you will. This provides a feasible way to incorporate CO 2 without the cost of direct C 1 + C 1 addition.
However, how is the cycle created? At some point, a larger molecule would have to split into smaller entities (but not C 1 as this would simply be a reverse addition). Hence, the simplest cycle that can be generated involves a C 2 , C 3 , and a C 4 species, with the C 4 able to split into two C 2 species, as shown in Figure 3A.
neighboring carbons that are less positive or even have a (slight) partial negative charge. Addition of CO2 can be "directed" towards the less positive carbon of this larger molecule with a lower barrier-an umpolung-like "strategy" if you will. This provides a feasible way to incorporate CO2 without the cost of direct C1 + C1 addition.
However, how is the cycle created? At some point, a larger molecule would have to split into smaller entities (but not C1 as this would simply be a reverse addition). Hence, the simplest cycle that can be generated involves a C2, C3, and a C4 species, with the C4 able to split into two C2 species, as shown in Figure 3A. We illustrate this with a formaldehyde oligomerization cycle, part of the formose reaction [35], as shown in Figure 3B. In previous work [13], we calculated the free energy profiles for CH2O oligomerization and explored both the thermodynamic and kinetic features of this cycle, which we now summarize briefly. Experimentally the reaction starting with solely CH2O has an induction period because the direct dimerization of CH2O to form glycolaldehyde (GA) has a very high barrier (recall our earlier discussion of why direct C1 + C1 → C2 is kinetically challenging). Once the C2 species is formed, however, aldol addition of CH2O proceeds rapidly (under appropriate experimental conditions) to generate a wide range of CHO-containing molecules, bypassing the direct C1 + C1.
The cycle has several interesting features. Aldol additions of CH2O (hydrate) are exergonic (−6 kcal/mol, on average). The C2 + C1 → C3 kinetically favors glyceraldehyde (GLA), which may subsequently isomerize (via an enol intermediate) to thermodynamically more stable dihydroxyacetone (DHA). The kinetically favored C3 + C1 → C4 converts GLA into the ketone (erythrulose), which is thermodynamically more stable than its aldehyde isomer (erythrose). Erythrose can potentially cyclize, add further CH2O units, or undergo a reverse aldol (C4 → C2 + C2) to regenerate GA. The splitting reaction is marginally endergonic (+2 kcal/mol). The overall cycle is exergonic by −7 kcal/mol. We note three other interesting things about this system: (1) Active species in the cycle are in equilibrium with "off-cycle" compounds, which together form a "pool" of interconnected compounds-a feature that could stabilize the cycle and provide a point of regulatory control [36]. (2) Because the C2 "recycling" species (GA) and the C1 "food" species (CH2O) are both at oxidation state zero, additional redox reactions are not required, thereby simplifying the situation. (3) Cannizzaro reactions (such as 2 CH2O + H2O → HCO2H + CH3OH) can be parasitic on the cycle, but also provide access to compounds with a wider range of oxidation states; and experimentally the formose reaction generates a range of carboxylic acids [37], including those found in extant biochemistry.
This sets the stage for us to explore alternative cycles that are thermodynamically favorable, but possibly more feasible kinetically than the extant cycles we have analyzed in the previous section. We illustrate this with a formaldehyde oligomerization cycle, part of the formose reaction [35], as shown in Figure 3B. In previous work [13], we calculated the free energy profiles for CH 2 O oligomerization and explored both the thermodynamic and kinetic features of this cycle, which we now summarize briefly. Experimentally the reaction starting with solely CH 2 O has an induction period because the direct dimerization of CH 2 O to form glycolaldehyde (GA) has a very high barrier (recall our earlier discussion of why direct C 1 + C 1 → C 2 is kinetically challenging). Once the C 2 species is formed, however, aldol addition of CH 2 O proceeds rapidly (under appropriate experimental conditions) to generate a wide range of CHO-containing molecules, bypassing the direct C 1 + C 1 .

Alternative Cycles: A Brief Exploration
The cycle has several interesting features. Aldol additions of CH 2 O (hydrate) are exergonic (−6 kcal/mol, on average). The C 2 + C 1 → C 3 kinetically favors glyceraldehyde (GLA), which may subsequently isomerize (via an enol intermediate) to thermodynamically more stable dihydroxyacetone (DHA). The kinetically favored C 3 + C 1 → C 4 converts GLA into the ketone (erythrulose), which is thermodynamically more stable than its aldehyde isomer (erythrose). Erythrose can potentially cyclize, add further CH 2 O units, or undergo a reverse aldol (C 4 → C 2 + C 2 ) to regenerate GA. The splitting reaction is marginally endergonic (+2 kcal/mol). The overall cycle is exergonic by −7 kcal/mol. We note three other interesting things about this system: (1) Active species in the cycle are in equilibrium with "off-cycle" compounds, which together form a "pool" of interconnected compounds-a feature that could stabilize the cycle and provide a point of regulatory control [36]. (2) Because the C 2 "recycling" species (GA) and the C 1 "food" species (CH 2 O) are both at oxidation state zero, additional redox reactions are not required, thereby simplifying the situation. (3) Cannizzaro reactions (such as 2 CH 2 O + H 2 O → HCO 2 H + CH 3 OH) can be parasitic on the cycle, but also provide access to compounds with a wider range of oxidation states; and experimentally the formose reaction generates a range of carboxylic acids [37], including those found in extant biochemistry.
This sets the stage for us to explore alternative cycles that are thermodynamically favorable, but possibly more feasible kinetically than the extant cycles we have analyzed in the previous section.

Alternative Cycles: A Brief Exploration
In extant biochemical cycles, ACE (overall zero oxidation state) is the key recycled C 2 species. The main goal of this section is to explore if this role can instead be played by one of its chemical cousins: ethanal (−4), glycolaldehyde (−2), glycolic acid (+2), or GLX (+4). We will also allude to CH 2 O as a potential crossover C 1 food species, and we will see that intersecting cycles allow for a variety of possibilities. Before we embark, some words of caution. We remind the reader that our quantum calculations only provide thermodynamic data of the CHO metabolites; they do not take into account kinetics or even the presence of co-factors (such as sulfur-containing compounds) to tune the thermodynamics. We also chose to highlight a very limited set of specific examples that we think are interesting out of many possibilities. One should not over-interpret our results and fall into the trap of believing Kipling-esque Just So stories.

Minor Modifications to the 3HP/4HB Cycle
Before jumping into alternative C 2 species, one drawback to the extant 3HP/4HB cycle as discussed in Section 3.2.2 is over-reducing the carbon moiety, thereby requiring an endergonic step to generate AcACE before it splits into two ACE. A straightforward way to avoid this: If SSA is not reduced to 4HB but instead undergoes a thermodynamically favorable aldehyde-to-ketone isomerization (via the enol), AcACE can be formed directly from SSA, as shown in the right half of Figure 4 (long green line) We would expect this direct SSA → AcACE conversion in the absence of any reducing agents.
In extant biochemical cycles, ACE (overall zero oxidation state) is the key recycled C2 species. The main goal of this section is to explore if this role can instead be played by one of its chemical cousins: ethanal (−4), glycolaldehyde (−2), glycolic acid (+2), or GLX (+4). We will also allude to CH2O as a potential crossover C1 food species, and we will see that intersecting cycles allow for a variety of possibilities. Before we embark, some words of caution. We remind the reader that our quantum calculations only provide thermodynamic data of the CHO metabolites; they do not take into account kinetics or even the presence of co-factors (such as sulfur-containing compounds) to tune the thermodynamics. We also chose to highlight a very limited set of specific examples that we think are interesting out of many possibilities. One should not over-interpret our results and fall into the trap of believing Kipling-esque Just So stories.

Minor Modifications to the 3HP/4HB Cycle
Before jumping into alternative C2 species, one drawback to the extant 3HP/4HB cycle as discussed in Section 3.2.2 is over-reducing the carbon moiety, thereby requiring an endergonic step to generate AcACE before it splits into two ACE. A straightforward way to avoid this: If SSA is not reduced to 4HB but instead undergoes a thermodynamically favorable aldehyde-to-ketone isomerization (via the enol), AcACE can be formed directly from SSA, as shown in the right half of Figure 4 (long green line) We would expect this direct SSA → AcACE conversion in the absence of any reducing agents. However, the preceding reduction step, SUC → SSA, requires a reducing agent, and it may not be feasible to halt reduction of the carboxylic acid at the aldehyde depending on the reducing agent and the reaction conditions. If 4HB is formed, we would expect dehydration to result first in but-3-enoic acid (see Figure 4) followed by rehydration to 3HB, but now the endergonic oxidation must be carried out to form AcACE. Considering the pool of compounds in equilibrium, certainly CRT is in the mix as an isomer of but-3enoic acid (Gr = −80.19); an isomer of 3HB and 4HB is 2-hydroxybutanoic acid (Gr = −82.79); and 2-oxobutanoic acid (Gr = −68.75) is an isomer of SSA and AcACE. The extant 3HP/4HB cycle utilizes the thermodynamically favorable isomers, but this may be less kinetically feasible for a proto-metabolic cycle that avoids or minimizes uphill steps. However, the preceding reduction step, SUC → SSA, requires a reducing agent, and it may not be feasible to halt reduction of the carboxylic acid at the aldehyde depending on the reducing agent and the reaction conditions. If 4HB is formed, we would expect dehydration to result first in but-3-enoic acid (see Figure 4) followed by rehydration to 3HB, but now the endergonic oxidation must be carried out to form AcACE. Considering the pool of compounds in equilibrium, certainly CRT is in the mix as an isomer of but-3-enoic acid (G r = −80.19); an isomer of 3HB and 4HB is 2-hydroxybutanoic acid (G r = −82.79); and 2-oxobutanoic acid (G r = −68.75) is an isomer of SSA and AcACE. The extant 3HP/4HB cycle utilizes the thermodynamically favorable isomers, but this may be less kinetically feasible for a proto-metabolic cycle that avoids or minimizes uphill steps.

Glycolic Acid as the Recycling C 2
Glycolic acid is produced in measurable quantities in formose reactions [37], Miller spark-discharge experiments [38], and plays a potential role as a scaffold in oligopeptideforming chemistry [39]. Could it play a role as the recycling C 2 compound instead of ACE? In Figure 4, we plot the energy profile for glycolic acid (gold lines) relative to ACE (green lines) assuming the same reaction steps as the 3HP/4HB cycle. The net reaction is still similar to Equation (1) and the cycle overall exergonic at −45.03 kcal/mol.
The first CO 2 addition forming tartronic acid is endergonic (6 kcal/mol), marginally less (by 1 kcal/mol) compared to the 3HP/4HB cycle. Three exergonic reactions follow: Two reductions lead to glyceric acid, which when dehydrated, converts into the enol of PYR. This provides a connection to the DC/4HB cycle (red line in Figure 2), providing an alternative starting point to having two early successive endergonic reactions. However, a switch to DC/4HB would mean glycolic acid is depleted rather than recycled; the net reaction as shown in Equation (4) would then be: with an overall free energy of −45.03 × 2 + 18.08 = −71.98 kcal/mol. (There is no cycle!) However, let us keep following the gold line. Reduction of PYR leads to lactic acid where it may potentially accumulate as a thermodynamic sink. The second addition of CO 2 followed by a methyl shift leads to MAL. The analogous CO 2 addition in the green line was only 3 kcal/mol endergonic (see Section 3.2.2), but for the gold line this is back to the higher 7-8 kcal/mol range. From MAL, reduction of a carboxylic acid to an aldehyde is marginally endergonic and an aldehyde-to-ketone isomerization (long gold line) analogous to SSA → AcACE avoids the over-reduction and formation of the 2-oxobutanoic acid enol. The final split produces ACE and regenerates glycolic acid.
However, if 2-oxobutanoic acid was produced, recall in the previous section that it is a less stable structural isomer of AcACE and part of the same "pool". A keto-enol shift will convert it to AcACE which can split into two ACE. This would not recycle glycolic acid but provides a route to the 3HP/4HB cycle (green line in Figure 2), and illustrates one way such cycles crisscross and interconnect-a possible situation one might expect in a messier proto-metabolic milieu.
The kinetically challenging methyl-shift, however, remains problematic. Lactic acid could provide a way out. Unlike 3HP, ACR, or PRP, where the central carbon is the least positive and would be the site of CO 2 addition (forming branched C 4 ), in lactic acid the terminal methyl is the least positive carbon. If CO 2 were to be added there, it could provide a route to unbranched MAL (G r = −48.14) and the DC/4HB cycle. This CO 2 addition would only be endergonic by 4 kcal/mol, given that G r (lactic acid) is −52.03 kcal/mol, however it might still be kinetically challenging.
In fact, 3HP (G r = −51.71) could convert to lactic acid through a dehydration-rehydration via ACR-and this provides a tantalizing cycle which modifies the 3HP/4HB cycle by patching in a small part of the DC/4HB: Glycolic acid is not needed in this case.

Ethanal or Glycolaldehyde (GA) as the Recycling C 2
Having investigated glycolic acid as an oxidized cousin of ACE, we now examine the free energy profile starting of its reduced counterparts. The purple lines in Figure 4 are for ethanal as the recycling C 2 species assuming the same reaction steps as the 3HP/4HB cycle; the net reaction is still Equation (1).
The first CO 2 addition converts ethanal to MSA (a connection to the 3HP route!) and this oxidation is mildly endergonic (∆G = +3 kcal/mol). Reduction of MSA to malonaldehyde is energy-neutral; reduction to 3-hydroxypropanal is exergonic as expected; dehydration to acrolein is energy-neutral; and then reduction to propanal is highly exergonic (analogous to ACR → PRP). The second CO 2 addition to a branched C 4 species, followed by the prebiotically questionable methyl shift, leads to SSA (in the 4HB route!). Iso-merization to acetoaldehyde followed by splitting produces ACE and regenerates ethanal. The net reaction is Equation (1) and the net cycle is −45.03 kcal/mol. If instead, there is a switch to the 3HP/4HB cycle, ethanal is depleted rather than recycled in the net reaction as shown in Equation (5) ethanal + 2 H 2 CO 3 + 3 H 2 → 2 ACE + 3 H 2 O (5) with an overall free energy of −45.03 × 2 + 41.62 = −48.44 kcal/mol, but not a cycle. One might also imagine (not shown), a rehydration of acrolein to form lactaldehyde, which perhaps might lead directly to SSA via CO 2 addition to the terminal methyl. This might be possible if no reducing agent was present, but that is a challenge given the prior reduction steps.
What if we start from GA? The first addition of CO 2 leads to 2-OH-3-oxopropanoic acid (∆G = +4 kcal/mol). Two reduction steps lead to glucic acid and glyceraldehyde (GLA), respectively. Dehydration leads to the enol form of methylglyoxal. If one were to add the second CO 2 to methylglyoxal, kinetically it would be most feasible to form 3,4-dioxobutyric acid (∆G = +4 kcal/mol), thereby avoiding branched C 4 species (for the same reason as PYR → OXA). Reduction of the aldehyde and ketone to alcohols leads to 2-deoxythreonic acid, and then splitting produces ACE and regenerates GA. This last step is marginally endergonic (∆G = +0.12 kcal/mol) but all other steps are exergonic except for the two CO 2 additions. This free energy profile is shown in Figure 5. by the prebiotically questionable methyl shift, leads to SSA (in the 4HB route!). Isomerization to acetoaldehyde followed by splitting produces ACE and regenerates ethanal. The net reaction is Equation (1) and the net cycle is −45.03 kcal/mol. If instead, there is a switch to the 3HP/4HB cycle, ethanal is depleted rather than recycled in the net reaction as shown in Equation (5) ethanal + 2 H2CO3 + 3 H2 → 2 ACE + 3 H2O (5) with an overall free energy of −45.03 × 2 + 41.62 = −48.44 kcal/mol, but not a cycle. One might also imagine (not shown), a rehydration of acrolein to form lactaldehyde, which perhaps might lead directly to SSA via CO2 addition to the terminal methyl. This might be possible if no reducing agent was present, but that is a challenge given the prior reduction steps.
What if we start from GA? The first addition of CO2 leads to 2-OH-3-oxopropanoic acid (ΔG = +4 kcal/mol). Two reduction steps lead to glucic acid and glyceraldehyde (GLA), respectively. Dehydration leads to the enol form of methylglyoxal. If one were to add the second CO2 to methylglyoxal, kinetically it would be most feasible to form 3,4dioxobutyric acid (ΔG = +4 kcal/mol), thereby avoiding branched C4 species (for the same reason as PYR → OXA). Reduction of the aldehyde and ketone to alcohols leads to 2-deoxythreonic acid, and then splitting produces ACE and regenerates GA. This last step is marginally endergonic (ΔG = +0.12 kcal/mol) but all other steps are exergonic except for the two CO2 additions. This free energy profile is shown in Figure 5. This system has connections to CH2O oligomerization. GA is the recycling C2 species in the formose reaction, and GLA is the C3 species. One could imagine coupled cycles where GLA in the formose cycle can undergo dehydration to methylglyoxal, thereby accessing the cycle in Figure 5. We could also imagine including CH2O as an additional C1 food source, which would allow aldol addition to ACE to form 3HP, skipping MLN and MSA in the (green-line) 3HP/4HB cycle; or better yet transformations that utilize an aldol addition to a C3 to avoid forming branched C4 species. Changing the food source would also change the net cycle reaction and thermodynamics.
However, before we get too excited at the tantalizing possibilities, keep in mind that our speculations of potential cycles are based on just the thermodynamics; we have not quantitatively accounted for the kinetics, or reducing agents appearing at the appropriate time, or hydrating/dehydrating conditions, just to name a few issues. This system has connections to CH 2 O oligomerization. GA is the recycling C 2 species in the formose reaction, and GLA is the C 3 species. One could imagine coupled cycles where GLA in the formose cycle can undergo dehydration to methylglyoxal, thereby accessing the cycle in Figure 5. We could also imagine including CH 2 O as an additional C 1 food source, which would allow aldol addition to ACE to form 3HP, skipping MLN and MSA in the (green-line) 3HP/4HB cycle; or better yet transformations that utilize an aldol addition to a C 3 to avoid forming branched C 4 species. Changing the food source would also change the net cycle reaction and thermodynamics.
However, before we get too excited at the tantalizing possibilities, keep in mind that our speculations of potential cycles are based on just the thermodynamics; we have not quantitatively accounted for the kinetics, or reducing agents appearing at the appropriate time, or hydrating/dehydrating conditions, just to name a few issues.

The Challenge of GLX
Earlier, we discussed the role of GLX as a high-energy intermediate, recycled in the 3HP bicycle pathway (Section 3.2.4). Could GLX play a role as the recycled C 2 species in a simpler cycle such as the ones we have just examined? One problem: Adding the first equivalent of CO 2 is significantly more endothermic, whether it produces oxomalonic acid (∆G = +13 kcal/mol) with a +4 increase in oxidation state, or dioxopropanoic acid (∆G = +14 kcal/mol) with concurrent addition of H 2 and an overall +2 increase in oxidation state. If instead GLX is reduced in an earlier step before addition of CO 2 , this would be similar to the cases we have already examined.
Formation of GLX (G r = +0.22) is marginally endergonic from the reference molecules. Alternatives include the reaction of CO 2 and H 2 CO, the two "food" molecules we have considered, which is +5 kcal/mol uphill; or HCO 2 H and CO which is +2 kcal/mol uphill (although CO would have to be generated prior). A prebiotic mixture might contain small amounts of GLX, but under reducing conditions, it would favorably convert into glycolic acid, ACE, or more reduced compounds.
However, as a high-energy reactant, GLX could drive thermodynamically favorable cycles. We previously saw (Section 3.2.4) that the downhill stretch of the 3HP bicycle has a free energy change of −37.77 kcal/mol. Stubbs et al. have experimentally designed a reverse TCA cycle analog that proceeds favorably in the absence of enzymes or metals as catalysts [40]. As shown in Table 3, our calculations show why this is thermodynamically favorable. The net chemical reaction as shown in Equation (6) is: The overall free energy change is −45.47 kcal/mol. Interestingly, the Jankowski groupadditivity method [31] yields a similar overall value for this cycle, although there are differences in the individual steps. (eQulibrator does not have ∆ f G i 0 values for most of these "formates" so we are unable to make that comparison.) This cycle does not fix carbon overall nor is it meant to. Instead, it essentially reduces GLX to ACE while releasing units of CO 2 . GLX acts as an oxidizing agent, and this is its likely role in a proto-metabolic setting. With a G r similar to the reference compounds, it is the C 2 equivalent of an "oxidizing" CO 2 unit.
As a second example, we examine a non-enzymatic cycle designed experimentally by Muchowska et al. [41]. This cycle marries parts of the oxidative TCA and glyoxalate cycles, but by utilizing GLX as the oxidizing agent, the net cycle is now marginally exergonic (−0.22 kcal/mol), instead of being +45.03 kcal/mol endergonic as we saw in Section 3.1. The net reaction as shown in Equation (7) is: Individual steps in the cycle are shown in Table 4. The first half utilizes several compounds and reactions in common with Stubbs et al. [40], and we use the same names (from Stubbs et al.) in Table 3  The cycle begins with PYR rather than OXA; Steps 2-4 are the same as Table 3; and Steps 7-9 are also found in the TCA cycle. While the main recycled compound is PYR, one equivalent of GLX is also recycled. (The two equivalents of GLX enter in Steps 1 and 4). From the quantum chemistry data, the very exergonic reductive Step 3 is balanced by the very endergonic oxidative Step 7. The Jankowski group additivity approach does not do as well here for the same reasons we saw in the TCA cycle. Where eQuilibrator values are available, our quantum calculations match up quite well. As in the previous example, GLX essentially acts a "high-energy" oxidizing agent allowing for exergonic steps.

Discussion
We are now ready to take a broader look at the overall thermodynamics of C 1 to C 4 species containing only carbon, hydrogen, and oxygen. We have arranged this map to emphasize redox states, inspired by Smith and Morowitz in their magisterial tome The Origin and Nature of Life on Earth [42]. We do not capture the intricacies they have investigated given the limited system size in our work, although our present analysis is influenced by Chapter 4 of their book.
Our overall thermodynamic map is displayed in Figure 6. Recall that the reference molecules (with G r values of zero) are H 2 , H 2 O, H 2 CO 3 , and that our quantum chemical values are for neutral molecules in aqueous solution (zero ionic strength) under standard conditions at 298 K. The carbon "source" (CO 2 , or H 2 CO 3 in aqueous solution) has +4 oxidation state; and our present map represents a reducing environment with H 2 acting as the reductant. What features can we pick out under these conditions? For C1 compounds, the most reduced compound CH4 (−4) is the most stable, followed by CH3OH (−2). Note that the aldehyde CH2O (0) and the acid HCOOH (+2) are of similar relative free energy. Carbon monoxide is the only compound higher in energy than the reference state. As size increases (towards the right in Figure 6), we see the same trends: alkanes, as the most reduced compounds, are the most stable; and relative free energies are less negative as oxidation state increases.
For C2 compounds, ethanol is more stable than ethene, ethanal is more stable than ethylene glycol, ACE is more stable than GA, and glycolic acid is more stable than glyoxal. The same trends hold for larger (C3 or C4) compounds of the same oxidation state: an OH group is more stable than having a C=C double bond; COOH is more stable than separate C=O and OH groups; C=O is more stable than a diol; having acid and alcohol groups is more stable than two C=O groups (recall the Cannizzaro products in Section 3.3). Additionally, the ester methylformate is less stable than its acid isomer ACE, and formic anhydride is less stable than its isomer GLX.
Looking at the most stable compounds across oxidation states for the C2 compounds, the series is ethane (−8), ethanol (−6), ethanal (−4), ACE (0), glycolic acid (+2), GLX (+4), oxalic acid (+6). Note that ACE is slightly more stable than ethanal, in the same way that HCO2H is slightly more stable than CH2O in the C1 series. For the C3 and C4 compounds, simple ketones are more stable than their corresponding aldehydes and acids; although the acid is still more stable than the aldehyde (PRP is slightly more stable than propanal and butyric acid is slightly more stable than butanal). This similarity in energy between the acid and aldehyde may be an important feature in proto-metabolism by allowing essentially energy-neutral and thus thermodynamically reversible redox transformations, possibly aided by primitive redox catalysts. We previously alluded to this feature: acids, aldehydes and ketones potentially form an equilibrating pool of molecules that may exert some control and regulation [36,37]. To this, we might add esters and anhydrides that might allow for the sequestering of acid metabolites. For C 1 compounds, the most reduced compound CH 4 (−4) is the most stable, followed by CH 3 OH (−2). Note that the aldehyde CH 2 O (0) and the acid HCOOH (+2) are of similar relative free energy. Carbon monoxide is the only compound higher in energy than the reference state. As size increases (towards the right in Figure 6), we see the same trends: alkanes, as the most reduced compounds, are the most stable; and relative free energies are less negative as oxidation state increases.
For C 2 compounds, ethanol is more stable than ethene, ethanal is more stable than ethylene glycol, ACE is more stable than GA, and glycolic acid is more stable than glyoxal. The same trends hold for larger (C 3 or C 4 ) compounds of the same oxidation state: an OH group is more stable than having a C=C double bond; COOH is more stable than separate C=O and OH groups; C=O is more stable than a diol; having acid and alcohol groups is more stable than two C=O groups (recall the Cannizzaro products in Section 3.3). Additionally, the ester methylformate is less stable than its acid isomer ACE, and formic anhydride is less stable than its isomer GLX.
Looking at the most stable compounds across oxidation states for the C 2 compounds, the series is ethane (−8), ethanol (−6), ethanal (−4), ACE (0), glycolic acid (+2), GLX (+4), oxalic acid (+6). Note that ACE is slightly more stable than ethanal, in the same way that HCO 2 H is slightly more stable than CH 2 O in the C 1 series. For the C 3 and C 4 compounds, simple ketones are more stable than their corresponding aldehydes and acids; although the acid is still more stable than the aldehyde (PRP is slightly more stable than propanal and butyric acid is slightly more stable than butanal). This similarity in energy between the acid and aldehyde may be an important feature in proto-metabolism by allowing essentially energy-neutral and thus thermodynamically reversible redox transformations, possibly aided by primitive redox catalysts. We previously alluded to this feature: acids, aldehydes and ketones potentially form an equilibrating pool of molecules that may exert some control and regulation [36,37]. To this, we might add esters and anhydrides that might allow for the sequestering of acid metabolites.
In a column of molecules sharing the same oxidation state, carboxylic acids (if they can be formed) are always the most thermodynamically stable. In the C 3 set for example, we have PRP (−2), lactic acid (0), MSA (+2), MLN (+4), and tartronic acid (+6). What are the most stable metabolites in the 3HP/4HB and the DC/4HB cycles? They are all acids and are among the most thermodynamically favored compounds in their respective columns. Once nature evolved specific enzymatic catalysts to reduce kinetic barriers, it seems to have maintained the primary use of thermodynamically stable compounds, the same ones you might expect to persist in a proto or pre-metabolic mixture.
An interesting metabolite not formally listed in the four core metabolic cycles is lactic acid, marginally more stable than its isomer 3HP (by 0.3 kcal/mol). We have speculated on its potential role in alternative (yet closely related) cycles in Section 3.4.4 but we do not yet know evolutionarily how it came to play its present role in extant metabolism. We should not discount it as an important player in a proto-metabolic system. Lactic acid's close cousins, 3HP and ACR are part of the 3HB/4HB cycle.
In an organic compound with only a single functional group, a ketone is noticeably more stable than its aldehyde isomer. However, the free energy difference is smaller in more oxidized molecules and there are potential reversals: MSA is marginally more stable than PYR. MSA also has a stable enol thanks to its carbonyl group in the β-position with respect to the acid group. That is also why AcACE (in the 4HB route) is the most stable zero oxidation state C 4 compound. Hence, if we imagined the simplest cycle utilizing the most stable acid molecules and CH 2 O as "food" to avoid oxidation state changes, the C 2 , C 3 , C 4 candidates would be ACE, lactic acid (or 3HP), AcACE, respectively.
Could this be why ACE is the C 2 species recycled in extant core metabolisms? It has the same zero oxidation state as GA (in the formose reaction), but is significantly more stable. Glycolic acid and GLX are favorably reduced to ACE thermodynamically, and ACE is also more stable than its reduced counterpart ethanal. Perhaps ACE combines the twin features of stability and simplicity. Furthermore, having an anionic form (in contrast to ethanal) might provide an anchor to a proto-metabolic mineral surface as suggested by Wachterhauser in his origin-of-life proposals [43], potentially presaging the ubiquitous role of phosphates (perhaps later) in the evolution of metabolism. AcACE also plays a key role as the splitting C 4 species. Interestingly our calculations find that acetic anhydride is only marginally less stable than AcACE, and might play some role as a "pool" species if dehydrating conditions turn out to be important.
However, getting to AcACE is potentially tricky. We have proposed bypassing 4HB, CRT, 3HB, via a thermodynamically favorable aldehyde-to-ketone isomerization from SSA to avoid the late endergonic step. However, how do we get to SSA? At present, the most likely source is SUC, the only other molecule besides ACE to be present in all four core metabolisms shown in Figure 1. However, if SUC is being reduced to SSA, what stops reduction proceeding all the way to 4HB? It is unclear if there is a prebiotically plausible redox agent that will prevent reduction of the acid all the way to the alcohol, which may be why extant metabolism evolved this longer route. Additionally, how do you get to SUC? As previously discussed, routes involving MAL avoid the problem of converting a branched C 4 into an unbranched species, but there are uphill steps to get to MAL. As alternatives, we speculated on routes involving lactic acid or methylglyoxal that get to MAL with minimal endergonicity. However, just because a reaction is thermodynamically favorable does not mean it is feasible in a prebiotic milieu.
We have discussed in Section 3.3 why an overall C 1 + C 1 → C 2 reaction, embedded in a cycle is a more feasible way of incorporating food molecules into a growing metabolic system compared to the direct simple addition. With ACE as the C 2 species, the net cycle is −45.03 kcal/mol. While two of its reduced cousins, ethanol and ethane, provide a more exergonic net reaction, both are less reactive and less likely to involve the rich carbonyl chemistry we have discussed. Ethanal could be a viable competitor (net cycle of −41.62 kcal/mol) but the lack of an anionic counterpart could be less robust for the anchoring advantages of selective surface chemistry. As to ACE's oxidative cousins, glycolic acid might work (net cycle of −18.08 kcal/mol) although it favorably converts to ACE under reducing conditions (the carboxylate could be "protected" by surface attachment), and we have already discussed the challenge and potential role of GLX. These considerations may cement the key role played by ACE.
Extant metabolites participating in core cycles occupy a particular range of oxidation states depending on the size range of compounds involved. In the C 1 to C 4 group, these are mostly in the −2 to +4 oxidation states with OXA (+6) being just outside that group. The C 5 and C 6 compounds are mostly in the +2 to +6 range, with transient OXS at +8. Why these ranges? We can only speculate for now that there is a balance between adding oxidizing food equivalents (CO 2 at +4) and reducing equivalents (H 2 ) to form stable compounds in some optimal "Goldilocks" zone. Over-reduction to rock-bottom (overly stable) alkanes may limit chemistry. However, some reduction is needed to drive a thermodynamically favorable cycle.
Most of this section has focused on discussing variable oxidation states, but we would be remiss if we did not briefly discuss variable protonation states. Our calculated free energies use only neutral species (because of the poorer anion results). This necessarily means we lump together multiple species into a single entity even if it exists in more than one protonation state at significant concentration depending on the pH. Until we devise a protocol that provides better results compared to experiment for anionic species, these are an undifferentiated part of the aforementioned "pool" of equilibrating species.

Conclusions
We have explored the thermodynamics of uncatalyzed reductive TCA cycle and its chemical cousins by calculating the relative free energies of potential CHO metabolites in aqueous solution under standard conditions in the hope that it may shed light on how proto-metabolic systems are constructed and sustained. By focusing on simple cycles that incorporate C 1 + C 1 → C 2 as the net reaction for building biomass, we considered why acetate is uniquely positioned as the key recycled compound, and tracked the energy changes of chemical transformations with an eye on oxidation state changes.
Could other food molecules change the free energy map? Yes. We would expect systematic shifts in our calculated thermodynamics. Would introducing co-factors (such as thioesters) change the energy profiles? Definitely. Such coupling reactions may provide ways around otherwise unfavorable endergonic reactions. Could other reference molecules be used, say if we employed a different redox couple? Certainly. Considering the extreme case of strongly oxidizing conditions with available O 2 and no H 2 , our map in Figure 6 would flip. CH 4 would be the least thermodynamically stable and CO 2 would be rockbottom on the scale. Would varying concentrations and dynamic flows of material and energy modify the situation? Most certainly. These questions and many more could be asked, but their answers are beyond the scope of the present work. Our group is actively exploring a number of these interesting rabbit-holes, several of which have been hinted at in this article, that build on the present baseline results.
This baseline provides a preliminary step to exploring the effect of temperature, pH, and different effective concentrations (activities) of the various solutes including the reference species. One could recalculate the relative free energy changes using ∆G = ∆G 0 + RT ln Q. Our baseline numbers provide the ∆G 0 . Solute activities (not equal to unity) and the effect of pH (via a [H + ] term) in the reaction quotient Q would then result in different ∆G values. These only address equilibrium thermodynamics in a perturbative way. Temperature, pH, and solute activities are also expected to impact kinetics. Our group is working to calculate baseline activation barriers for reaction types in these cycles, although this requires the lengthy process of optimizing transition states.
Once baseline kinetics is established and we have made better estimates of the rate constants, we plan to incorporate both thermodynamic and kinetic parameters into a dynamic network model that tracks the fluxes of chemical species as they cycle through their myriad reactions. Our hope is that this will provide a preliminary model (with its attendant limitations) that represents the establishment of proto-metabolic cycles under non-equilibrium conditions, and we hope to provide a richer story in the future.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/life11101025/s1, Table S1: Comparison of ∆G (kcal/mol) for uncatalyzed oxidative TCA cycle reactions using M06-2X-D3 functional and the cc-pVDZ basis set; Table S2: Comparison of ∆G (kcal/mol) for uncatalyzed oxidative TCA cycle reactions using anions; Table S3: Comparison of ∆G (kcal/mol) for reductive 3HP/4HB Cycle; Table S4: Comparison of ∆G (kcal/mol) for additional reactions in DC/4HB cycle and 3HP bicycle; Table S5  Acknowledgments: This research was supported by the University of San Diego. Shared computing facilities were provided by the saber2 and saber3 high-performance computing clusters at the University of San Diego.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Of the compounds in our full dataset of 223 compounds (see Table S5), 86 were found in the eQuilibrator database. Overall, from Figure A1, our quantum-calculated G r values closely match the experimentally-derived data. There are two outliers: ethoxide, a threemembered ring (likely due to different determination of ring strain); and tartronic acid, for no apparent reason we can surmise although we think the eQuilibrator value is wildly incorrect. Our quantum values for similar di-acids such as malic and tartaric acid match up well.  Table S5, the truncated list in Table A1 below provides an easy reference to the compounds explicitly mentioned in the Results section. If "(hyd)" is included in the name, it means the hydrate is the most stable form. If "(enol)" is included in the name, the enol is the most stable form. Table A1. Gr values in kcal/mol for compounds in the Results section.

Oxid. State eQuilibrator Gr
Quantum Gr Figure A1. Comparison of quantum and available eQuilibrator G r values (kcal/mol).
While a full list of all G r values, compound names, abbreviations, molecular formulae, and overall carbon oxidation states are found in Table S5, the truncated list in Table A1 below provides an easy reference to the compounds explicitly mentioned in the Results section. If "(hyd)" is included in the name, it means the hydrate is the most stable form. If "(enol)" is included in the name, the enol is the most stable form.