1. Introduction
In extant biochemistry, autocatalytic cycles are a key feature of the metabolism [
1]. However, outside of living systems, there are few instances of such reaction networks that utilize simple substances not artificially designed. One exception, of interest to the origins-of-life research community, is the formose reaction whereby the C
1 “food” molecule (CH
2O) is converted to (CH
2O)
n sugars of increasing size and diversity. In the 1970s, the formose reaction was extensively investigated as a way to boost carbohydrate and food production [
2], but ultimately proved unfeasible. The core sugar-forming reaction mechanism utilizing aldol and retro-aldol reactions is now well known [
3,
4,
5]. However, because aldehydes are present, the alkaline conditions used in the formose reaction lead to a complex mixture due to Cannizzaro disproportionation reactions [
6], and thus, includes a plethora of acids and polyols.
There has been a recent resurgence in the investigation of the glorious mess that is the formose reaction. In a series of systematic studies, the Huck group examined how the observed product distribution was affected by changing the environmental variables [
7,
8,
9]. Paschek and co-workers examined how catalysts, plausibly present in carbon-containing meterorites, influenced sugar synthesis [
10], while Vinogradoff et al. used olivine silicate catalysts [
11]. Haas et al. investigated the effects of mechanochemistry [
12]. Omran, who highlighted the messiness of the formose reaction [
6], more recently looked at the self-construction of chemical gardens under conditions resembling hydrothermal vents [
13]. Large-scale computational methods have also been applied to the formose reaction to test various chemical network models [
14,
15,
16].
Our focus is on the smallest autocatalytic core of the formose reaction, as shown in
Figure 1. The C
1 “food” species is CH
2O, and in the absence of any other compounds in the autocatalytic cycle, the reaction has a slow induction phase. This is because the direct dimerization of CH
2O into the C
2 linchpin molecule glycolaldehyde has a high activation barrier due to the absence of an umpolung species to create new C–C bonds. However, once a small amount of the linchpin is present, the reaction accelerates rapidly. The direct dimerization can now be bypassed because the aldol addition reactions, C
2 + C
1 → C
3 and C
3 + C
1 → C
4, proceed with much lower barriers. Autocatalysis is triggered when a retro-aldol reaction regenerates more of the linchpin species (C
4 → C
2 + C
2) which leads to increasing rates of CH
2O consumption. While lower concentrations of larger sugars (C
5 to C
8) may be observed relatively early in the reaction, most of these are produced in later stages after CH
2O has been consumed. Note that the Cannizzaro disproportionation reaction is always present, and not just for the C
1 molecules (as shown in
Figure 1); any sugar molecule in the cycle can disproportionate.
In our early work [
17] on formaldehyde oligomerization, we found that polyol formation may compete with the aldol addition, but although C–O bond formation is kinetically favored over C–C bond formation, the former is thermodynamically reversible and hydrolyzes in aqueous solution. In contrast, the aldol addition of CH
2O with C–C bond formation is thermodynamically quite favorable and has significantly higher barriers for the reverse reaction. Also, isomerization reactions (e.g., glyceraldehyde to dihydroxyacetone) or ring closures (e.g., erythrose or threose for the C
4 species) may lead to an equilibrating pool of off-cycle species that reduce the reactivity of the in-cycle and more reactive aldehyde compounds.
Our group is interested in proto-metabolism. After completing the initial (albeit) small thermodynamic maps of CHO and CHOS compounds [
18,
19], we wondered if the presence of sulfur could play a role in taming the complex product distribution of the formose reaction. Sulfur has a long, storied history in prebiotic chemistry. While the autocatalytic metabolic core [
20] in present life (exemplified by the tricarboxylic acid cycle and its analogs) mainly consists of CHO molecules, the importance of coenzyme A may be a vestige of sulfur’s broader involvement in proto-metabolism as proposed by De Duve [
21]. The Sutherland group’s prebiotic map proposes a “cyanosulfidic” world [
22], and sulfur’s significance at the origin of life has been highlighted in a recent review [
23]. Sulfur was also prominently featured in Wachterhauser’s pyrite world [
24], which helped usher proto-metabolism to the forefront of recent origin-of-life research.
As previously reported [
19], we identified mercaptoaldehyde as the C
2 linchpin species analogous to glycolaldehyde, and showed that its formation from glycolaldehyde and H
2S was thermodynamically favorable and kinetically feasible. We outlined the potential favorable thermodynamics of a subset of the C
3 and C
4 sulfur analogs, but we did not examine detailed pathways (the purpose of that paper was to look at broad thermodynamic trends). We speculated that depending on where the thiol groups were positioned, they may provide some degree of selectivity but we only provided broad trends: thiol groups on terminal carbons were thermodynamically favored, and sulfur in the aldose rings could shift the equilibrium from exclusively favoring ketoses over aldoses. We did not look at sulfur’s influence on the retroaldol reaction, nor calculate most of the reaction barriers that would influence the kinetics of the aldol additions, isomerizations, or Cannizzaro reactions.
The present work takes a detailed look at the core autocatalytic cycle and examines how sulfur analogs influence the thermodynamics and kinetics of the many interconnected reaction steps. Thus, we provide an analogous (albeit messier) map to our recent study of the smallest core [
25]. While H
2S was mentioned in the previous work, it was only in the context of sequestering formaldehyde or as an external catalyst rather than being incorporated as thiols throughout the cycle.
This article is organized as follows: After describing our computational protocol and its limitations, the combined results and discussion addresses (1) how the presence of H2S impacts the C1 food and C2 linchpin compounds; (2) the competing Cannizzaro reactions drive the reduction of food species to methanol; (3) the impact of thiols in the two aldol reactions that add CH2O to form C3 and C4 species, respectively; (4) how thiols may shift the favorability of the C4 → C2 + C2 retroaldol; and (5) we will examine the potential of C5 and C6 forming reactions and their limitations.
2. Computational Methods
We use the same computational protocol as our recent work on the thermodynamic map of small CHOS molecules and the exploration of the core formose cycle [
19,
25] so we can make direct comparisons and extend our free energy map. Here, we provide a brief description of that protocol for convenience. Much of the text in this section is reproduced from those two articles (published in this journal) [
19,
25] since we think the description is both clear and succinct. Essentially, we calculate the free energies using quantum chemical methods, and our protocol showed good agreement with the available experimental results for CHO systems [
17,
18,
26,
27] (there are no experimental data for sulfur-containing compounds involved in the formose reaction).
The computational details are as follows: The geometry of each molecule is optimized and its electronic energy is calculated at the B3LYP [
28,
29,
30,
31] flavor of density functional theory with the 6-311G** basis set. To maximize the probability of finding the global minima, multiple conformers are generated using molecular mechanics (MMFFs force field [
32]). The optimized structures are embedded in a Poisson–Boltzmann continuum to calculate the aqueous solvation contribution to the free energy. While this does not provide a specific concentration, it assumes a dilute solution such that the electrostatic field generated by a neighboring solute molecule is effectively screened by the water solvent. One can consider all solutes to have the same relative concentrations in our calculations. In a handful of cases, when the solvation calculation gave a seemingly spurious free energy, we made empirical corrections as explained in the
Supplementary Materials.
Zero-point energy corrections are included, and we apply the standard temperature-dependent enthalpy correction term (for 298.15 K) from statistical mechanics by assuming translational and rotational corrections are a constant times
kT, and that low frequency vibrational modes generally cancel out when calculating enthalpy differences. However, entropic corrections in aqueous solution are problematic [
33,
34,
35]. Changes in free energy terms for translation and rotation are poorly defined in solution due to restricted complex motion, particularly as the size of the molecule increases (thus, increasing its conformational entropy). Free energy corrections come from two different sources: thermal corrections and implicit solvent. Neither of these parameters is easily separable, nor do they constitute all the required parts of the free energy. We follow the approach of Deubel and Lau [
36], assigning the solvation entropy of each species as half its gas-phase entropy (calculated using standard statistical mechanics approximations similar to the enthalpy calculations described above), based on proposals by Wertz [
37] and Abraham [
38] that upon dissolving in water, molecules lose a constant fraction (~0.5) of their entropy.
To estimate activation energies, transition states were optimized by including several explicit water and/or catalytic molecules to aid in transferring H moieties. All calculated transition states have one significant negative eigenvalue corresponding to the reaction coordinate (eigenvector) involving bond breaking/forming. Several conformers built by hand are tested in each case and we only report the lowest calculated barriers.
In comparing the equilibrium concentrations in a self-oligomerizing solution of 1 M glycolaldehyde at 298 K, our protocol fared very well compared to subsequent NMR measurements [
27]. Our relative Gibbs free energies in aqueous solution are typically within 0.5 kcal/mol compared to experiment. That being said, our protocol shows systematic errors of 2–3 kcal/mol when calculating barriers involving carbonyl chemistry when compared to experimental results. Going to a higher level of theory does not reduce this error [
39], nor does using anionic species [
18]. There are also specific computational problems that include cations in our protocol, as discussed in previous work [
25]. Quantum chemistry is about error cancelation, and our protocol (with its foibles, including the simplistic entropy correction) has worked well even with this systematic error for activation barriers. Thus, we do well on thermodynamics, while we have a larger error bar for kinetics though still reasonable for the carbonyl chemistry and aldol reactions in this work.
3. Results and Discussion
To connect this work to our previously published CHOS thermodynamic map [
19], we will use the same set of reference compounds: CO
2, H
2, H
2O, and H
2S will be assigned a relative free energy,
Grel of 0.0 kcal/mol. The
Grel of all other species can be determined by calculating the change in free energy, Δ
G, for forming the species, analogous to a free energy of formation. For example, the formation of the linchpin species mercaptoaldehyde (C
2H
4OS) can be written as follows:
Since Δ
G of this reaction is −6.2 kcal/mol, we assign
Grel(C
2H
4OS) = −6.2 kcal. For the rest of this paper, we will use the unit kcal as shorthand to signify kcal/mol.
A consistent set of reference compounds allows us to globally compare energies. In our figures,
Grel values are found next to each compound and in square brackets next to an arrow for transition states. Since some reactions may have more than one non-reference compound, we will also use Δ
G to designate the change in free energy when focusing on a particular reaction, where Δ
G =
Grel(products) −
Grel(reactants). Similarly, when we refer to the barrier of a specific reaction, we will designate this Δ
G‡ which compares
Grel of the transition state to either the reactants or products depending on whether the forward or reverse reaction is being discussed. Throughout this section, we will regularly compare our Δ
G and Δ
G‡ values to their non-sulfur counterparts in our previous work [
25].
3.1. Formaldehyde: The “Food” Species
As shown in
Figure 2,
Grel of CH
2O is +7.9 kcal. As shown in our previous work [
25], it is thermodynamically favorable for CH
2O to exist predominantly as its hydrate in aqueous solution. CH
2(OH)
2 has a
Grel of +3.3; therefore, Δ
G for hydration is −4.6 kcal; the transition state has a
Grel of +21.1, the barrier Δ
G‡ is +13.2 kcal; thus both our Δ
G and Δ
G‡ values are in good agreement with experimental values [
40].
If H2S is present, it can potentially compete with water and add to the carbonyl. Addition of H2S is marginally less exergonic (ΔG = −3.9 kcal) with a marginally lower barrier (ΔG‡ = +12.1 kcal). Thus, depending on the concentration of H2S we expect to see both addition products in solution in equilibrium with CH2O. The dehydration of CH2(OH)(SH) to form the thione CH2S is significantly uphill (ΔG = +17.2 kcal); any CH2S formed would easily rehydrate. However, under dehydrating conditions, or if CH2S is ever found in higher than transient concentrations, its direct reaction with CH2O to form mercaptoaldehyde is highly exergonic, although the barrier is high; Grel of the transition state is +56.1 kcal and this reaction is kinetically rather unfavorable.
CH
2O and its hydrate can undergo a Cannizzaro disproportionation reaction to form HCOOH and CH
3OH. This reaction is thermodynamically favorable (Δ
G = −19.6 kcal) but has a modest barrier (Δ
G‡ = +25.9 kcal). Thermodynamic favorability is essentially driven by the reduction of CH
2O to CH
3OH with a
Grel difference of −11.2 − (+7.9) = −19.1 kcal, while the oxidation of methanediol to HCOOH has a tiny
Grel difference. With H
2S, we expect some concentration of CH
2(OH)(SH) to be present, thus, the Cannizzaro reaction can lead to a thioacid or a thione-acid as shown in
Figure 3. Both reactions are exergonic (although significantly less so) and the barriers are ~5 kcal/mol higher. This is consistent with our thermodynamic map on CHOS compounds where both thioacids and thioneacids were significantly less stable than their corresponding carboxylic acid [
19].
The sulfur analogs for addition to carbonyl or a Cannizzaro reaction have similar transition states to their non-sulfur analogs, the hydration reaction or the carboxylic-acid Cannizzaro-forming reaction. The optimum transition state for an addition reaction has two catalytic water molecules (an 8-center transition state) while the lowest barrier Cannizzaro has zero catalytic waters (a 6-center transition state) as shown in
Figure 4.
3.2. Mercaptoaldehyde: The C2 Linchpin Species
Glycolaldehyde is the linchpin C
2 species in the formose reaction; only a small amount is needed to kick-start the autocatalytic cycle (the presence of any member of the cycle will also suffice). For the non-sulfur analog, we have examined its role using the same protocol in our previous work [
25]. In the presence of H
2S, glycolaldehyde can be converted into its sulfur analog, mercaptoaldehyde. The energetics of the reaction pathway was shown in our previous work [
19] and is repeated in
Figure 5. The reaction is overall exergonic by 5.7 kcal. The first two steps, addition of H
2S followed by dehydration are slightly endergonic but the barriers are low. The subsequent two steps, conversion of thione to enol to aldehyde, are both exergonic and also have low barriers. Thus, this reaction is both thermodynamically and kinetically feasible.
As shown in
Figure 6, hydration of mercaptoaldehyde is marginally endergonic by 0.4 kcal. In a dilute aqueous solution, the equilibrium will shift towards the hydrated species. In the presence of food species, the hydrate can undergo a cross-Cannizzaro reaction with CH
2O to form mercaptoacetic acid. Δ
G of this reaction is −19.9 kcal (similar to the C
1 Cannizzaro), while the barrier (Δ
G‡ = +28.2 kcal) is higher by ~2 kcal. Mercaptoaldehyde can undergo a self-Cannizzaro reaction or a cross-Cannizzaro with glycolaldehyde, but both these have higher barriers.
Addition of H2S to mercaptoaldehyde is uphill (ΔG = +3.5 kcal), and the subsequent cross-Cannizzaro reaction with CH2O to form 2-mercaptothioacetic acid is not as exergonic (ΔG = −12.8 kcal) and has a higher barrier (ΔG‡ = +30.0 kcal); C2 species containing two sulfur atoms are minor at best (or more likely not found) in the complex mixture.
The way forward into the autocatalytic cycle is the aldol addition of CH
2O to mercaptoaldehyde via its enol. The C
2 enolization is 7.6 kcal uphill and the barrier is +21.5 kcal (in
Figure 5 on the left, starting from mercaptoaldehyde, this is the reverse step). This is similar thermodynamically to the enolization of glycolaldehyde (Δ
G = +7.6 kcal, Δ
G‡ = +24.3 kcal) but kinetically mercaptoaldehyde enolization has a lower barrier of ~3 kcal. Hence, the presence of sulfur analogs may accelerate entry into the autocatalytic cycle.
3.3. Sulfur Analogs of the C3 Species: Formation and Interconversion
Before launching into the details,
Figure 7 shows our big-picture map of the many reactions that can take place involving C
1 to C
4 species that could be involved directly or indirectly in the core autocatalytic cycle. The top row shows the relevant C
1 and C
2 species. The second row and the top half of the leftmost column are the C
3 species. The rest of
Figure 7 contains the C
4 species with retro-aldol products shown in blue boxes. All numerical values (in kcal) are
Grel of the species (if next to a structure) or a transition state (if next to an arrow and in square brackets). The nomenclature of each compound is based on how many carbon atoms it has, its main functional group, the location of the sulfur, and in some enols, the location of the double bond. For example, the C
2 species aldehyde, enol, and thione are named
2a,
2e, and
2t respectively. Black arrows refer to enolization reactions. Aldol additions are shown with red arrows. The aldol addition of CH
2O to
2e has two possible products, the thione with sulfur on the first carbon (
3t1) and the aldehyde with sulfur on the second carbon (
3a2). Further nomenclature will be discussed as we cover the relevant compounds and reactions.
The two possible products for this first C
2 + C
1 → C
3 aldol addition are 2-thioglyceraldehyde (
3a2) and the thione analog of glyceraldehyde (
3t1). Stereochemically, these are the analogs of D-glyceraldehyde in conjunction with our previous work [
25]. Forming the aldehyde is thermodynamically very favorable (Δ
G = −5.9 − (1.4 + 7.9) = −15.2 kcal) from the enol (or (Δ
G = −5.9 − (–6.2 + 7.9) = −7.6 kcal from the aldehyde). This is 2.4 kcal less exergonic than its non-sulfur counterpart, the addition of CH
2O to glycolaldehyde. However, the barrier for the sulfur analog (Δ
G‡ = +16.5 kcal from the enol) is ~3 kcal lower than its non-sulfur counterpart. Thus, not only is forming the C
2 enol enhanced kinetically by the presence of the thiol group, but the subsequent aldol addition is also enhanced kinetically. Note that we use the enol rather than an enolate in aldol reactions because our calculations with neutral molecules gave far better results than using anions (see Computational Methods), similar to our previous calculations on the formose reaction [
25].
Not surprisingly the thione product (
3t1) is less favored thermodynamically, but the barrier to form the thione (Δ
G‡ = +16.8 kcal from the enol) is essentially similar to forming the aldehyde. Thus, we expect both C
3 products to be formed in this system. Interestingly, the transition states have very different distances for the forming C–C bond as shown in
Figure 8. In both cases the H transfer is essentially completed before the C–C bond is formed; however, the formation of
3a2 has a shorter distance of 2.07 Å in the transition state, while the less concerted
3t1 has a forming C…C distance of 2.66 Å. We tried several transition state conformations; the structures shown in
Figure 8 are the ones with the lowest barriers. Note that the
Grel values for these transition states at +24.3 and +24.7 kcal are some of the most positive, and therefore, in the overall map, this C
2 + C
1 → C
3 aldol addition may represent the rate-determining step globally.
Interconversion of the C3 species to their isomeric counterparts is possible via enolizations. 3a2 can enolize into 3e2 (ΔG = +6.3 kcal, ΔG‡ = +25.2 kcal) but is much less likely to form the thione 3t2 (as indicated by the dashed arrow) which is less favored both kinetically and thermodynamically. The enol is more likely to revert back to the aldehyde 3a2. On the other hand, 3t1 favorably enolizes to form 3e1-1 (C3 enol with thiol on the first carbon, and the double bond at the first carbon). The reaction is 5.6 kcal downhill and the barrier is low (ΔG‡ = +14.8 kcal) due to the instability of the thione which can be considered a higher-energy or “activated” species. The enol favorably converts to the ketone 3k, the thermodynamic sink of the C3 species. While the ketone could enolize at the other end to form 3e1-2 and subsequently 3a3 (3-mercaptoglyceraldehyde), this is overall less favorable. Thus, 3a3 may be a minor species in equilibrium with 3k.
While we calculated both the cis and trans enols and their corresponding transition states, we found that in the vast majority of cases, the cis enol was favored both kinetically and thermodynamically; hence, we show only the cis isomers in
Figure 7 with their corresponding
Grel values. The free energy differences comparing cis and trans structures can be found in
Supplementary Materials. In
Figure 9, we show an example of a cis transition state (interconverting
3k and
3e1-1) and a trans transition state (interconverting
3t1 and
3e1-1). The C…H distance in both transition states is similar (1.58 and 1.61 Å). Most of the O…H distances are in the expected range, except the transition state on the left has one that is noticeably longer (1.60 Å) and one noticeably shorter (0.99 Å), and this is likely due to the longer S…H distance of 2.19 Å.
Globally in our map, the Grel values for the enolization transition states range from +16.7 to +21.9 kcal. Thus, if the C1 + C2 → C3 barrier can be traversed under some experimental conditions, we expect these enolization reactions to also be kinetically accessible. If the C3 enols are formed transiently, and the food species CH2O is plentiful, C1 + C3 → C4 aldol addition will proceed. Globally, these aldol addition transition states have Grel values ranging from +20.4 to +25.0 kcal, which are slightly higher than for the enolizations. It is more kinetically favorable for a C3 enol to convert back to a ketone or aldehyde, but the aldol addition is more thermodynamically favorable, as discussed in the next section.
3.4. Sulfur Analogs of the C4 Species: Formation and Interconversion
Each of the three C
3 enols can potentially undergo the aldol addition with CH
2O to form a C
4 compound. On the right hand side of
Figure 7,
3e2 can either form the branched aldehyde
4ba2 or the thione
4t2. While the formation of
4ba2 is exergonic, it is a “dead end” where the formose reaction is concerned, and its only route back into the cycle is the reverse retroaldol back to C
3 and C
1. Forming
4t2 is only slightly exergonic from the enol (Δ
G = +2.5 − (0.4 + 7.9) = −5.8 kcal) and barely endergonic from the aldehyde
3a2 (Δ
G = +0.5 kcal). Not surprisingly, the non-sulfur counterpart forming the ketone is significantly more exergonic. The
3e2 + CH
2O →
4t2 addition has a relatively low barrier (Δ
G‡ = +22.4 − (0.4 + 7.9) = +14.1 kcal from the enol, or +20.4 from the aldehyde
3a2). We were unsuccessful in isolating the transition state to form
4ba2, and our optimizations went to the transition state for the formation of
4t2. In the non-sulfur counterpart, the barrier to the ketone is 4 kcal lower than to the branched aldehyde. The presence of the sulfur loosens the transition state and is likely why we were unable to isolate the transition state to
4ba2. Regardless, we do not expect
4ba2 to play an important role in this system.
On the left hand side, CH2O addition to the enol 3e1-1 leads to the branched aldehyde 4ba3 or the ketone 3-thioerythrulose (4k3). Forming the ketone is both thermodynamically and kinetically more favorable (ΔG = −16.7 kcal, ΔG‡ = +16.6 kcal) from the enol. Forming the branched aldehyde is ~2 kcal less exergonic and the barrier is ~2 kcal higher. However, globally, aldol addition is kinetically less favored than enolization when comparing the transition state Grel values as discussed earlier. Similarly to what we see for 3e1-1, the enol 3e1-2 can add CH2O to form the same branched aldehyde 4ba3 or 1-thioerythrulose (4k1). Once again, forming the ketone is both thermodynamically and kinetically more favored (ΔG = −17.2 kcal, ΔG‡ = +16.0 kcal) from the enol. Should the branched aldehyde 4ba3 form, it can undergo a retroaldol, eliminating CH2O to form either enol, although the path to 3e1-2 is kinetically slightly favored over 3e1-1.
Similarly to what we found for the C
3 species, the ketones are the thermodynamic sink for the C
4 compounds.
4k1 is unlikely to isomerize into the much less stable thione
4t1, and it most likely equilibrates with
4k4 in solution (with a computationally insignificant
Grel difference of 0.1 kcal).
4k4 can isomerize into the aldehyde
4a4 (via a terminal enol) and a small amount of the aldehyde likely exists at equilibrium. Similarly, the ketone
4k3 can isomerize into the aldehyde
4a3 (center of
Figure 7) as a minor species at equilibrium. On the right hand side of
Figure 7, the less stable thione
4t2 has two pathways forward. It could isomerize to the ketone
4k3 or to the aldehyde
4a2. Both pathways are rather exergonic, with the ketone being thermodynamically favored over the aldehyde. However, the path to the aldehyde (via enol
4e2) is favored kinetically.
While we expect the open chain aldehydes
4a3 and
4a4 to be minor species in solution equilibrating with their respective ketones
4k3 and
4k4, once
4a2 is formed it is unlikely to reverse to the much less stable thione
4t2. This pathway will be of particular interest in light of the C
4 → 2 C
2 retro-aldol that facilitates autocatalysis to be discussed in the next section. Also, any of the open chain aldehydes can be partially sequestered by ring-closing reactions to form the furanoses (
4r2,
4r3, 4r4). The furanoses are all slightly favored thermodynamically over their open chain aldehyde counterparts and the barriers to ring closure (and ring opening) are low (~12–16 kcal) and similar to aldehyde hydration barriers [
17]. We expect these furanoses to be part of an equilibrating mixture. Since the formation of the aldehyde-thione
4t1 is less likely, we do not expect to see much of its ring-closed counterpart
4r1 either.
All values and structures shown in
Figure 7 are sulfur analogs for D-erythrose and D-erythrulose. While we also calculated the
Grel values for D-threose, the overall story does not change and the differences in energies only show minor variances of ~0–2 kcal/mol. Thus, we do not include the threose/erythrose differences in the main body of this article to keep the discussion tractable. The relevant data for D-threose sulfur analogs can be found in
Supplementary Materials.
3.5. Tetrose Aldol and Retro-Aldol Reactions
In the non-sulfur formose reaction, the (D-erythrose) C4 → 2 C2 retro-aldol reaction is endergonic by 2.2 kcal. Hence, autocatalysis likely does not kick in until there is a sufficient concentration buildup of C4 versus C2. The barrier is high at 31.5 kcal. Can the presence of sulfur change this situation?
Of the four retroaldol reactions in
Figure 7 (blue arrows and boxes), only one is significantly exergonic:
4t1 →
2a +
GA (glycolaldehyde). The split initially produces
GA and the enol of mercaptoaldehyde that easily enolizes into
2a. Δ
G = −10.0 kcal for the overall reaction and the barrier is low (Δ
G‡ = +15.0 kcal) because the reactant is an activated species. However, it is very unlikely that high-energy
4t1 is formed in the first place, and we do not expect this pathway to be practically realized (For all our C
4 → 2 C
2 retro-aldol reactions, the barriers to erythrose were consistently lower than for threose by 2–3 kcal).
We expect some (albeit small) amount of
4a4 to be present in the mixture since it can be formed from the ketone thermodynamic sink
4k4. The reaction
4a4 →
2a +
GA is endergonic by 1.6 kcal/mol, marginally less unfavorable than its non-sulfur counterpart, and the barrier is also marginally lower (Δ
G‡ = +29.9 kcal). Perhaps sulfur can accelerate this autocatalytic step ever so slightly. Unfortunately, the same cannot be said for
4a3 (formed in equilibrium with
4k3) because the retroaldol split leads to the thione
2t, and thus, the reaction is significantly endergonic (Δ
G = +11.5 kcal). We were unable to cleanly locate the transition state because it forms a four-membered heterocycle intermediate that looks like the cycloaddition product of
2t and the enol of glycolaldehyde. More details are shown in
Supplementary Materials, but this pathway is unlikely to occur.
The most interesting retroaldol reaction is
4a2 →
2a +
GA. From our calculations, the reaction is marginally exergonic (Δ
G = −0.2 kcal) although this is within the computational error so we consider it equal. The barrier is still relatively high (Δ
G‡ = +28.7 kcal) although lower than the
4a3 retroaldol, and it is ~3 kcal lower than its non-sulfur counterpart. Thus, the situation is more promising in the presence of sulfur when the linchpin mercaptoaldehyde is present. Autocatalysis could begin earlier because the C
4 → 2 C
2 reaction is no longer endergonic, and the kinetics are slightly more favorable. The transition state for this retroaldol reaction is shown in
Figure 10. It has a longer C…C distance of 2.64 Å and the H has not quite transferred to the carbonyl oxygen with an O…H distance of 1.40 Å (other distances are as expected).
Globally, the
Grel value for this retroaldol transition state of +22.2 kcal is on par with those of the C
2 + C
1 → C
3 and C
3 + C
1 → C
4 aldol addition reactions which range from +20.4 to +25.0. We therefore expect that the retroaldol could potentially compete kinetically with further aldol additions such as C
4 + C
1 → C
5. If the C
4 enol is
4e2, addition of CH
2O will lead to the thermodynamically less stable thione-pentose as shown in the first row of
Figure 11. Thus, we expect the C
4 → 2 C
2 retro-aldol from
4a2 to be favored over further aldol addition of CH
2O. This is a unique situation, because it is not true thermodynamically for other C
4 + C
1 → C
5 additions where formation of the linear 3-ketopentoses (via
4e1,
4e1-1 or
4e3-1) is significantly exergonic as shown in
Figure 11. The 3-ketopentoses however are dead ends in the formose cycle; they are thermodynamic sinks that do not undergo retroaldol C
5 → C
3 + C
2 reactions in addition to removing C
4 species from the pool—such reactions are parasites of the autocatalytic cycle. And as we have seen in
Figure 7, since aldol additions to branched products are much less favorable, enolization of the 3-ketopentoses and addition of CH
2O in a C
5 + C
1 → C
6 reaction is much less likely.
3.6. When C1 Food Is Depleted
The scenarios discussed in the previous sections assume that the C1 food species is abundant. But what happens when it begins to deplete? Depending on the relative concentration of the various C2, C3, and C4 species, the following reactions could begin to be important: C2 + C2 → C4 aldoses (the opposite of the retroaldol), C2 + C3 → C5 aldoses or ketoses (depending on whether the C3 or C2 enolizes), C2 + C4 → C6 aldoses or ketoses (depending on whether the C4 or C2 enolizes), and C3 + C3 → C6 ketoses. This also opens up the possibility of incorporating more than one thiol group into a C4, C5 or C6 species.
Of the C
2 + C
2 → C
4 reactions, the most favorable reaction between glycolaldehyde and mercaptoaldehyde is to form the 4-thioaldose
4a4 (see
Figure 7). The reaction is overall exergonic (Δ
G = −1.6 kcal) but the barrier is higher (Δ
G‡ = +28.3 kcal overall, or +20.7 kcal from the enol) compared to aldol addition involving C
1 food species which have barriers 2–5 kcal lower. If two mercaptoaldehyde molecules dimerized, this forms the 2,4-dithioaldoses. For this reaction, Δ
G = +0.6 kcal overall to form 2,4-dithioerythrose (the threose is 0.1 kcal/mol less stable) and the overall barrier is 27.6 kcal/mol. Neither of these sulfur analogs is as thermodynamically favorable as the dimerization of glycolaldehyde (Δ
G = −2.2 kcal) to form erythrose.
For the C
2 + C
3 → C
5 reactions, we see a similar story. The non-sulfur analog reactions are thermodynamically more favorable in the forward direction (and therefore, less likely to undergo the corresponding retroaldol). In
Figure 12, the addition of glycolaldehyde (via its enol) to glyceraldehyde to form open-chain ribose has Δ
G = −1.9 kcal, which can favorably undergo ring-closure (the pyranose is more stable than the furanose; values shown are for the β anomers). Similarly, the addition of glycolaldehyde to dihydroxyacetone (via its enol) to form ribulose has similar thermodynamics with Δ
G = −2.1 kcal (xylulose was less than 0.1 kcal different in free energy than ribulose).
For the sulfur analogs, the addition of
3a2 and glycolaldehyde to form the open-chain 4-thioribose has Δ
G = +0.1 kcal. Ring closure to both the pyranose and furanose is favorable; having sulfur in the ring stabilizes the furanose in this case. Starting from
3a3 and forming 5-thioribose has Δ
G = −1.6 kcal because thiols on the terminal carbon are favored. Ring closure is favorable and the pyranose is more stable than the furanose (see our previous work [
19] for a more detailed discussion on the position of thiol groups in open chain aldoses and rings). Based on our discussion of
Figure 7, we expect
3a3 to be less accessible because
3k is the thermodynamic sink in that pathway. Thus, the more relevant sulfur analog is
3a2, certainly less favorable than its non-sulfur counterpart. If the C
2 species is mercaptoaldehyde, the results are similar but slightly less favorable.
For sulfur analogs forming C
5 ketoses,
3k can form two distinct enols and thus adding glycolaldehyde leads to 3-thioribulose (Δ
G = +0.7 kcal) or 1-thioribulose (Δ
G = 0.0 kcal). Both these reactions are thermodynamically less favorable than the non-sulfur counterpart (The corresponding thio-xyluloses are within 0.4 kcal of the thio-ribuloses. Also, having mercaptoaldehyde has similar but slightly less favorable thermodynamics as shown in
Supplementary Materials). Another possibility is to add glycolaldehyde to the enol of
3a2 but the ketose product is a thione and the reaction is significantly endergonic.
For C
2 + C
4 → C
6, as shown in
Figure 13, we see the same trend. The aldol addition of erythrose and glycolaldehyde to form glucose is exergonic (Δ
G = −1.6 kcal) while its sulfur analog is endergonic (Δ
G = +1.1 kcal) starting from
4a2 (the most of the promising C
4 aldehydes) and glycolaldehyde. If mercaptoaldehyde is used as the C
2 species to form 2,4-dithioglucose, the reaction is more endergonic. Similar results are obtained for the C
3 + C
3 → C
6 reactions forming fructose (see
Supplementary Materials).
While C
1 food is plentiful, the most favorable cross-Cannizzaro reaction, both thermodynamically and kinetically, involves the reduction of CH
2O to CH
3OH as discussed in the sections describing C
1 and C
2 reactions. When the food runs out, larger aldehydes could undergo Cannizzaro reactions which parasitize the autocatalytic cycle. A preliminary analysis of our calculations suggests that C
3 and C
4 species show similar energetics to the C
2 species shown in
Figure 6. Hence, while Cannizzaro reactions are thermodynamically favorable, they have larger barriers and are kinetically less favorable than the aldol addition reactions.
4. Conclusions
If H
2S is incorporated as a thiol group in the formose reaction, its most salient contribution is utilizing mercaptoaldehyde as the C
2 linchpin species. Both its enolization barrier and the entry into the cycle via first aldol addition (C
2 + C
1 → C
3) are kinetically more favorable in the sulfur analog with barriers lowered by ~3 kcal. While there is no kinetic selectivity in forming the C
3 species, there is significant thermodynamic selectivity for the aldehyde
3a2 over
3t1. This could shunt the cycle through the reactions on the right hand side of
Figure 7. While the initial C
3 + C
1 → C
4 product is
4t2, it more favorably enolizes to
4e2 (over
4e3-2); thus, favoring the formation of
4a2, the only C
4 aldehyde that has a thermodynamically favorable C
4 → 2 C
2 retroaldol reaction (the barrier is also ~3 kcal lower than the non-sulfur analog). Thus, the presence of sulfur could accelerate the core autocatalytic cycle of the formose reaction compared to its non-sulfur analog, and this pathway is the most significant positive result of our work.
However, the messiness does not go away. A wide diversity of C
3 and C
4 compounds are accessible as shown in
Figure 7. Having the thiol group in different positions provides selectivity for some species over others, but also adds more compounds to the mix. Sulfur analogs do not slow down the competing Cannizzaro reaction since its most likely channel is via reduction of CH
2O to methanol, and reduces the C
1 food. As CH
2O depletes, sulfur analogs less favorably undergo further aldol additions to form C
4, C
5, and C
6 species compared to their non-sulfur counterparts; although as concentrations of C
2 and C
3 build up, the equilibrium will shift towards larger species. It is unclear if slowing down the formation of larger species is favorable for kick-starting a proto-metabolism.
A question we asked but did not sufficiently answer in our previous work [
19] was whether thiol groups could provide additional selectivity, especially if more than one thiol was present, and if there was a possibility that thiol groups could have served as precursor tags to phosphates in extant sugar metabolism. Having collected more data in this study, our answer at present is no. Forming dithiolated sugars is unfavorable, and even for monothiolated sugars, the thermodynamic favorability of the ketoses now causes these thermodynamic sinks to retard formose autocatalytic pathways. We are now considering if bisulfite analogs could lead to more pronounced selectivity instead of thiols.
Not included in this work is the intramolecular disproportionation of a thiolated sugar to form thioacids, or more promisingly, the addition of an organothiol to an aldehyde which disproportionates into a thioester. We are currently pursuing this possibility and expect to continue the story of the role of sulfur analogs in potential proto-metabolic autocatalytic cycles in a future publication.