^{1}

^{★}

^{2}

^{3}

^{4}

^{5}

This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

The formation of a self-sustaining autocatalytic chemical network is a necessary but not sufficient condition for the origin of life. The question of whether such a network could form “by chance” within a sufficiently complex suite of molecules and reactions is one that we have investigated for a simple chemical reaction model based on polymer ligation and cleavage. In this paper, we extend this work in several further directions. In particular, we investigate in more detail the levels of catalysis required for a self-sustaining autocatalytic network to form. We study the size of chemical networks within which we might expect to find such an autocatalytic subset, and we extend the theoretical and computational analyses to models in which catalysis requires template matching.

In previous work we introduced and investigated a mathematical model of catalytic reaction systems and autocatalytic sets [

In this paper we take a closer and more detailed look at our model and its results. First, we introduce a small modification to our mathematical definition of autocatalytic sets and the corresponding algorithm for finding them in general catalytic reaction systems (Section 3). This modification makes both the definition and the algorithm slightly simpler, and includes some specific (although probably rare) cases of autocatalytic sets which were previously left out. However, we show (formally, and in simulations) that this modified algorithm does not invalidate any previous results or conclusions.

Second, we show that there is a discrepancy between the theoretical and simulation results (Section 4). Both results show that a linear growth rate in level of catalysis is sufficient for the emergence of autocatalytic sets. However, there is a difference in the parameter values of these linear relations. Here, we recalculate and compare the required levels of catalysis in more detail and under different scenarios.

Third, we show how our model and algorithm can be used to answer other interesting questions relating to the emergence of autocatalytic sets (Section 5). In particular: What is the minimum required size of the molecule set for autocatalytic sets to emerge given a fixed (known) probability of a molecule catalyzing an arbitrary reaction?

Fourth, we show how more chemical realism can be included in our model, for example by considering template-based catalysis (Section 6). Even though this makes the model harder to analyze, it still generates interesting and useful results.

The next section briefly reviews our previously introduced model and definitions. The four sections following it will present the model modifications, extensions, and additional results mentioned above. The final section summarizes the main conclusions and discusses future directions. Mathematical proofs are provided in an

Our study fits within a large and growing body of work that aims to formally model how self-sustaining biochemical systems necessary for life might have emerged. This is an area that has been investigated from many angles over the last three decades. Some approaches that are similar in scope but different in their specific details from the one we study here include models based on Petri-nets [

Autocatalytic sets may have played an important role in the origin of life [

Note (as already stated earlier [

In [

In the original (mathematical) definition of RAF sets [

Similarly, the “F” part of the definition is stated in terms of the _{ℛ}_{ℛ}

Now imagine a situation where all reactions in _{ℛ}

To remedy this, and make sure these (probably rare but relevant) cases are also included, we propose a slight modification to our original definition of RAF sets as follows:

Given a catalytic reaction system

_{ℛ′}

_{ℛ′}

In other words, a set of reactions _{ℛ}_{ℛ}

This may seem like a minor point, but it could be an important one. Consider, for example, the (reverse) citric acid cycle, which has been argued to have (possibly) been a major step in the origin of life by synthesizing the basic building blocks of organic molecules [

Lemma 3.1 has several other desirable properties. First, it ensures that the recent result that Rosen’s (M,R) systems can be viewed and studied within the RAF framework [

Of course there are many more properties relevant to the origin of self-sustaining biochemistry beyond such ad-hoc conditions to exclude trivialities—for example, the dynamics of the reactions (the quantity of reagents and products (stoichiometry) along with thermodynamic considerations) and the effect of inhibition and degrading side reactions [

An additional benefit of our modified RAF definition is that it significantly simplifies the corresponding algorithm for finding RAF sets in general catalytic reaction systems, and its correctness proof. The original algorithm is based on repeatedly (and alternately) applying two reduction steps (starting from the full reaction set) [

Remove all reactions that do not conform to the RA requirement;

Remove all reactions that do not conform to the

Start with the complete set of reactions

Compute the closure of the food set cl_{ℛ}

For each reaction _{ℛ}

Repeat steps 2 and 3 until no more reactions can be removed.

The resulting (reduced) reaction set ^{2} log |

In [

This discrepancy can partly be explained by the fact that the theoretical analysis in [

To answer this, we repeated the original simulations, using the RAF algorithm to find _{n}_{n}

So, in short, we compare the required levels of catalysis for RAF sets to occur with high probability for three cases:

Computational case for

Computational case for

Theoretical case for

The computational values were calculated over 100 to 1000 (depending on the value of

_{n}_{C}_{A}

In addition to the level of catalysis required for RAF sets to emerge, we can use the RAF algorithm to answer other interesting, and related, questions. For example, one could ask what the minimum required size of the molecule set is (or, in the binary polymer model, the minimum size

Suppose we fix the probability of catalysis at _{n}

_{n}_{13} = 0.982, and _{n}

Similarly, with _{n}_{16} = 0.939, and _{n}

One could argue that the binary polymer model used in our studies so far is perhaps somewhat oversimplified to be biologically or chemically realistic. However, the model serves as a useful starting point with which precise mathematical statements can be formulated and proved, or at least tested computationally. Furthermore, our RAF definition and algorithm are independent of the particular model that is used, and can in principle also be applied to real catalytic reaction systems (for example metabolic networks, of which the already mentioned citric acid cycle is a core element). And, equally importantly, it is actually not very difficult to add more chemical realism into our mathematical models.

As one particular example, we have considered

In line with the original random CRS model, and some initial simulations with such template-based catalysis [

Note that this template matching requirement is almost the same as in the original simulations [

Theorem 4.1 (ii) of Mossel and Steel (2005) shows that for polymers of length up to _{e}

We now describe how this result modifies if catalysis is required to be template-based, as described above. Suppose that a polymer _{1} + _{2} that is complementary to the end-segment (of length _{1}) and the initial segment (of length _{2}) of the two molecules involved in the cleavage or ligation. Thus, for the above set-up we have: _{1} = _{2} = 2 and so _{2}) in [

The following result shows that RAFs will still arise with high probability under linear growth in the average number of reactions each molecule catalyzes, provided the constant involved is increased by a factor of ^{s}

_{s}(n) be the probability that there exists an all-molecule RAF under this template matching model. Suppose that each molecule catalyzes (on average) at least λ_{s}n reactions, where λ_{s}^{s}λ and λ >_{e}(κ). Then P_{s}(n) satisfies the same inequality as P (n), namely

As with the original (random) model (see Section 4), we expect there to be a difference between the theoretically predicted required level of catalysis and the observed level (from simulations) in case of template-based catalysis. Unfortunately, we are computationally even more restricted with this template-based catalysis case as with the original model. The running time of our RAF algorithm is polynomial in the size of the reaction set |^{n}^{n}^{2}^{n}

_{n}

However, for longer and longer molecule types, this restriction becomes less of a problem, as a longer molecule has an increasingly higher probability of matching a given 4-site template somewhere along its length. Indeed, the required level of catalysis for the template-based case tapers off as

Building on our previous work, we have investigated in more detail the levels of catalysis required for the emergence of autocatalytic sets in models of chemical reaction systems. First, we have shown that there is a discrepancy between the theoretically predicted levels and the computationally observed ones. Although both results yield a linear relation between the required level of catalysis and the size

Next, we looked at the minimum size of the molecule set (

Finally, we studied an extension of the original model, including template-based catalysis. We established formally that in this case a linear growth rate in the level of catalysis also suffices for RAF sets to appear with arbitrary high probability. However, the simulations show that for smaller values of

We intend to continue studying the emergence of autocatalytic sets in chemical reaction systems under various scenarios, models, and extensions. However, so far we have mainly studied the (static) underlying graph structures of such systems. One particularly important issue we hope to address in the future is the actual molecular dynamics in a given (catalytic) reaction system. Also, next to studying models of chemical reaction systems, we would like to apply our RAF framework to real (bio)chemical systems such as metabolic networks, or the collection of all known (organic) substrates and reactions. It is our hope that this line of work will help provide more insight into the (possible) origin of life in general.

The computations were performed at the Vital-IT (

Let RA denote the original definition of “Reflexively Autocatalytic” from [_{1}] the new one as given here in Section 3. We need to show that [RA] + [F-gen] implies [RA_{1}], so suppose that _{ℛ′}_{1}] holds. In the second case, where _{ℛ′}_{ℛ′}_{ℛ′}_{1}] holds.

Let

To establish claim (i) we first justify a third claim (iii): If a reaction _{1}) that is contained within an arbitrary subset (say _{2}) of _{2} being taken as the “current set of reactions”. To see this, the fact that _{1} is an RAF implies that there exists a molecule _{ℛ1} (_{ℛ1} (_{1} ⊆ _{2} and so cl_{ℛ1} (_{ℛ2} (_{2}. This establishes claim (iii) which implies, by induction, the further claim (iv): If a reaction

To establish claim (ii) we first note that, since the algorithm has terminated at some non-empty set of reactions _{ℛ′}_{ℛ′}

We first recall two quantities (_{m}, r_{m}_{m}

For any polymer _{s}_{n}

Given a (forward) reaction _{r}_{1} + _{2} (which is also polymer of length _{n}_{r}

In the template polymerization model, _{r}_{n}_{s}_{x}

Thus, if we let _{x}_{+}(_{+}(_{r}_{s}_{+}(_{t}^{s}r_{n–s}

Now, Inequality (2) applied to _{x}_{s}n_{s}^{s}λ

Now, following [_{+}(

By Inequalities (4) and (8) and the inequality (1 – ^{y}_{n–s}

_{n}

A simple example of a _{1}, _{2}_{3}_{4}} (open nodes). The food set is _{1}_{2}} (shown with bold arrows) is RAF.

The linear relations for the three cases.

The probability _{n}

The level of catalysis

The empirical (cases

_{A} |
1.0970 + 0.0189 | |

_{B} |
–0.4736 + 0.7012 | |

_{C} |
1.6339 |