All articles published by MDPI are made immediately available worldwide under an open access license. No special
permission is required to reuse all or part of the article published by MDPI, including figures and tables. For
articles published under an open access Creative Common CC BY license, any part of the article may be reused without
permission provided that the original article is clearly cited. For more information, please refer to
Feature Papers represent the most advanced research with significant potential for high impact in the field. Feature
Papers are submitted upon individual invitation or recommendation by the scientific editors and undergo peer review
prior to publication.
The Feature Paper can be either an original research article, a substantial novel research study that often involves
several techniques or approaches, or a comprehensive review paper with concise and precise updates on the latest
progress in the field that systematically reviews the most exciting advances in scientific literature. This type of
paper provides an outlook on future directions of research or possible applications.
Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world.
Editors select a small number of articles recently published in the journal that they believe will be particularly
interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the
most exciting work published in the various research areas of the journal.
The idea that chemical evolution led to the origin of life is not new, but still leaves open the question of how exactly it could have led to a coherent and self-reproducing collective of molecules. One possible answer to this question was proposed in the form of the emergence of an autocatalytic set: a collection of molecules that mutually catalyze each other’s formation and that is self-sustaining given some basic “food” source. Building on previous work, here we investigate in more detail when and how autocatalytic sets can arise in a simple model of chemical evolution based on the idea of combinatorial innovation with random catalysis assignments. We derive theoretical results, and compare them with computer simulations. These results could suggest a possible step towards the (or an) origin of life.
The idea that chemical evolution led to the origin of life was proposed independently by Oparin  and Haldane . It was later shown to be plausible through the experiments of Miller . Recently, different variants of these experiments were repeated, and analyzed with state-of-the-art molecular analysis technology [4,5]. This revealed the presence of thousands of molecular species, including many classes of organic and catalytic ones. However, it still leaves open the question of how such spontaneous chemical evolution can give rise to a coherent, self-reproducing collective of molecules.
One possible answer to this question was proposed in the form of the emergence of autocatalytic sets [6,7]. Informally, an autocatalytic set is a chemical reaction network in which the molecules mutually catalyze each other’s formation, and which is self-sustaining given a basic food source. That such autocatalytic sets can indeed form spontaneously was already shown early on through computer simulations [8,9,10]. Later on, they were also successfully constructed with real molecules in the lab [11,12,13,14], and shown to exist in the metabolic networks of prokaryotes [15,16,17]. The notion of autocatalytic sets was formalized and studied extensively as reflexively autocatalytic and food-generated (RAF) sets (see, for example, [18,19] and references therein).
Recently, a simple model of combinatorial innovation, referred to as the theory of the adjacent possible (TAP), was studied formally . In general, TAP states that evolving systems create their own future possibilities in an ever-increasing “adjacent possible”. In particular, the number of “things” that can come into existence (or be created) next increases as a combinatorial function of what is currently in existence. As such, this model can also be interpreted in the context of chemical evolution, where the “things” are molecular species. The more molecular species that are currently in existence, the more potential new species that can come into existence next through (spontaneous) chemical reactions between arbitrary combinations of currently existing molecules. An initial investigation of the combination of the TAP and RAF models was presented in the context of technological evolution . Here, we present results of a further investigation into this model combination, with both theoretical and computer simulation results, and more specifically in the context of chemical evolution and how it could lead to the formation of mutually catalytic and self-reproducing collectives of molecules as a possible step towards the (or an) origin of life.
The TAP model is based on the idea of combinatorial innovation . At its core is the following equation from :
where is the number of different “things” at time t, and is a decreasing sequence of probabilities (i.e., real numbers between 0.0 and 1.0). Interpreted in the context of chemical evolution, new molecular species are produced through chemical reactions with arbitrary combinations of already existing species as reactants. More specifically, at each time step t, each possible combination of i existing molecular species has a small probability of chemically reacting, producing a new species. In other words, the probabilities could be considered some sort of reaction propensities.
Note, though, that the above equation represents a deterministic version that does not guarantee to be an integer value, and it only serves to convey the general idea behind the model. Instead, we use a stochastic implementation that does guarantee integer values, as in . Although our mathematical results do not require it, we will assume for the simulations and the following algorithm that (i.e., to the power i).
In addition, for each newly created molecular species x and each of the existing molecule types y that were not present at time 0, x can catalyze the formation of y with a fixed probability p. Similarly, for each chemical reaction r that produces a new species, and each of the existing molecule types y that were not present at time 0, y can catalyze r, also with probability p. These random catalysis assignments are assumed to be independent across all pairs of molecules and reactions.
Our implementation of this model is described in Algorithm 1.
Several remarks should be made about this algorithmic description. First, an upper limit K on the possible number of reactants is set for numerical and computational reasons. It was already shown earlier that this does not significantly affect the overall behavior of the TAP model , and chemically it is a plausible constraint as well.
Next, the algorithm stops when exactly molecular species have been produced (due to the if-statement at line 16). Previously, the algorithm was allowed to finish the time step in which items are reached, in which case the final amount of items is generally larger than . However, for a more accurate comparison with theoretical results, here we terminate the algorithm immediately after the molecular species has been produced, thus not finishing the rest of the time step in which that happens.
Finally, in the for-loop at line 12, a new molecular species x is considered twice to catalyze its own production (when ). However, the loop is stated this way to keep it more concise, rather than having an additional if-statement, or two separate for-loops. Of course, in an actual implementation this double consideration can be easily excluded, and as long as is large enough, it is negligible in the theoretical derivations below.
Algorithm 1: TAP with catalysis.
Create initial molecular species labeled
Create a new species x labeled
Select i random reactants for the production of x from
With probability p assign x as catalyst to the reaction that produced y
With probability p assign y as catalyst to the reaction that produced x
Over time, the existing molecular species, and the particular chemical reactions that produced them, thus form a growing chemical reaction network, with a “food set” consisting of the initial species, as in the example in Figure 1. In addition, the molecules catalyze each other’s formation according to the catalysis probability p. Note that each reaction can have no, one, or multiple catalysts, depending on these random catalysis assignments. Similarly, each molecule type may catalyze no, one, or multiple reactions. Figure 1 shows a simple example of a chemical reaction network that resulted from the TAP model, with random catalysis assigned as indicated by the dashed arrows.
As before , the parameters , and K are fixed at the values , , and .
One could now ask what the probability is that in such a growing chemical reaction network, at some point a subset of molecule types exists in which the molecules mutually catalyze each other’s production, and which is sustainable on the given food set. Such a subset is known as a reflexively autocatalytic and food-generated (RAF) set . More formally, an RAF set is a set of chemical reactions and the molecule types involved in them such that:
Each reaction in is catalyzed by at least one of the molecules involved in .
Each molecule type involved in can be created from the food set through a sequence of reactions from itself.
An efficient computer algorithm exists to find such RAF sets in arbitrary chemical reaction networks, or determine that no such subset is present. This algorithm actually finds the (unique) maximal RAF (maxRAF), i.e., the union of all possible RAFs within a given network. Repeated application of the algorithm can then also identify smaller RAF subsets within the maxRAF, including minimal ones.
In fact, the entire reaction network in Figure 1 forms a maxRAF, but it contains several smaller RAF subsets (e.g., the two reactions forming the molecule types are an RAF, and so too are the four reactions forming ).
2.3. TAP and RAF
An initial investigation into the existence of RAF sets in the TAP model with catalysis was presented recently, but in the context of technological evolution . Here, we provide a more detailed study, and more specifically in the context of chemical evolution. First, we derive theoretical expressions for the probability of RAFs existing in instances of the TAP model. We then compare these with results from computer simulations, using an implementation of the TAP model as presented in Algorithm 1, and applying the RAF algorithm to large sets of random instances of the TAP model.
Consider an instance of the TAP model, described by (for ) where is a set of molecular species generated up to time t and is the set of all reactions involved in generating starting from . We let and denote the number of molecular species in and reactions in , respectively. Note that these families of sets (and their sizes) are random variables for each . We assume throughout this section that , and for at least one value of . We also explicitly assume in this section that for each pair where and , x catalyzes r with probability p, and that these catalysis events are stochastically independent.
Throughout this section we do not specifically require that , nor do we place any upper bound (such as ) on unless otherwise stated, or any bound (such as K) on the number of reactants of a reaction.
Since every reaction in the TAP model creates exactly one new product, we have:
In the TAP model, is nondecreasing, and with probability 1, as t grows.
is a Markovian random walk on the positive integers, with for all t, and the probability of the event is uniformly bounded away from 0 for all values of t. By a standard probability argument, for any positive integer k, the event that holds for all t has probability 0. Thus, which ensures that with probability 1. □
If an upper bound is imposed on (so that the process terminates when ) then Lemma 1 implies that is certain to eventually hit . Note also that the version of the TAP model studied theoretically in  takes place in continuous (rather than discrete) time, in which case Lemma 1 has a sharper statement: Provided that and for at least one other value of i, then with probability 1, tends to infinity in finite time. Here, we are modeling a system in discrete time, and so this “explosion in finite time” phenomenon does not arise. Nevertheless, in our simulation results, where we explicitly stop the process when molecular species have been produced, a sudden and rapid increase still occurs.
Next, consider the probability that the entire collection of reactions involved in generating (i.e., ) is an RAF. As the following lemma shows, this probability depends only on only through the size of this set (i.e., ), and so we denote the probability by .
is F-generated, and so it forms an RAF precisely if each reaction in is catalyzed by at least one molecule type in . By the independence assumption concerning catalysis assignments, the probability that any given reaction is catalyzed by at least one molecular species in is , and since there are reactions, the probability that all reactions in are catalyzed is which equals by Equation (2). □
We can now state our main theorem.
For the TAP + catalysis model, the following hold.
For any value of , , where is a term that tends to zero as .
Suppose that at some fixed time . Then the following hold:
If for , then
where is a term that converges to 0 as .
The expected number of reactions that each molecule type (not in ) catalyzes (i.e., ) in order for to equal θ is given by:
where as . In particular, for each such value of θ, f grows logarithmically with m.
Part (a): Let , and let . By the inequality for , we obtain , and thus, by Lemma 2, we obtain:
(since for ). Thus, as
Part (b-i): Conditional on , and setting , Lemma 2 gives:
where ∼ refers to asymptotic identity as m grows. Exponentiating gives , as required.
Part (b-ii): Conditional on , and setting for a value (to be determined), gives:
From Part (b), and again conditional on , the equation gives where tends to 0 as m grows. Noting also that , the result now follows from Equation (4). □
To see how fast converges to 1, Figure 2 shows the theoretical probability of Equation (3) for an all- RAF (solid line) against for a catalysis probability . The open circles represent results from the TAP model simulations. The dashed line shows simulation results for any-sized RAF (i.e., an RAF consisting of any number of reactions).
Given a fixed probability of catalysis p, it is clear that once the total number of molecular species becomes large enough, there is a sharp transition from RAF sets not existing at all to them existing in almost every instance of the model. Of course, RAFs of any size (dotted line) already occur at smaller values of than all- RAFs, but theoretically it is easier to deal with all- RAFs, as they are always automatically food-generated. Thus, the theoretical expression forms an upper bound on the actual probabilities .
Another way to consider these probabilities is to fix the number of molecular species and then see what the required level of catalysis is to obtain RAF sets with high probability. This level of catalysis indicates the average number of reactions catalyzed per molecule type. To see how this increases with increasing , Figure 3 shows the theoretical probability (solid lines) of an all- RAF against for different values of . The dots are values obtained from computer simulations of the TAP model, to again compare with the theoretical results. As expected, the curves move slowly to the right for larger values of , but the distance between each next pair of adjacent curves seems to be decreasing.
Taking the “transition point” to a high probability of RAFs to be at , Theorem 1(b-ii) predicts that should be close to . Figure 4 shows this function for a range of values of , with the open circles representing results from computer simulations (interpolated from the simulation data shown in Figure 3). These simulation results closely fit with the theoretically predicted logarithmic curve.
Finally, a comparison is made between the probabilities of any-sized RAFs previously obtained from simulations of the TAP model , and earlier results from a related model known as the binary polymer model . In this related model, smaller polymers can ligate into larger and larger ones, and larger polymers can cleave into smaller and smaller ones. This model has been investigated extensively in the context of RAF sets in the past [18,19].
Figure 5 shows the probability P of an RAF against the level of catalysis for various versions of the binary polymer model that use different ways of assigning catalysis. The red curves are the standard (uniform) catalysis distribution, the blue curves are a power law catalysis distribution, and the green lines are a sparse catalysis distribution . These results were obtained from computer simulations. The black curves are the theoretically calculated probabilities for an all-or-nothing catalysis distribution. Solid lines are maximum polymer length , while dashed lines are . The thick gray line shows simulation results for the TAP model, with an average number of molecular species of .
Clearly, the required level of catalysis to cause RAF sets to arise is higher in the TAP model (five to seven reactions catalyzed per molecule type, on average) than in the binary polymer model (one to two). However, this can be explained by the fact that in the TAP model there is always only one reaction that produces a given molecular species, whereas in the binary polymer model there are multiple reactions that can produce a given polymer. In other words, there is a large amount of redundancy in the reaction networks resulting from the binary polymer model, allowing for a lower level of catalysis (given that only one or two of the multiple reactions that produce a given polymer need to be catalyzed).
Note also that we restricted the chemical reactions in the TAP model to only generate one product. In general, reactions can produce more than one product, in which case there could be significantly more molecular species than reactions. It is therefore expected that in such a more general model version the required level of catalysis f will be lower than the five to seven suggested by Figure 5.
We reinterpreted a simple model of combinatorial innovation known as TAP (theory of the adjacent possible) in the context of chemical evolution and autocatalytic sets. We then derived theoretical expressions for the probabilities of such autocatalytic sets arising in instances of the TAP model. These theoretical predictions were verified with results from computer simulations.
These results show that autocatalytic sets do indeed have a high probability of arising in instances of the TAP model, given a large enough number of molecular species and/or level of catalysis. Of course this is still a very general model, as it is assumed that any molecular species can chemically react with any other, and catalysis is assigned randomly. However, previous work on a related model, known as the binary polymer model, showed that more realistic assumptions can be easily incorporated in such a model, and do not change the overall results very much, at least not qualitatively. Moreover, the quantitative changes can often be predicted from the more general basic model version .
Generally, the level of catalysis (i.e., the average number of reactions catalyzed per molecule type) needs to be somewhat higher in the TAP model than in the binary polymer model. However, this can be explained by the redundancy present in reaction networks resulting from the binary polymer model. One could easily imagine a version of the TAP model where certain molecular species can also be produced by multiple reactions.
In conclusion, the results presented here may suggest a possible step towards the (or an) origin of life, where self-sustaining and reproducing autocatalytic sets arise during a process of chemical evolution. In fact, Wollrab et al.  conclude from their “Miller-type” chemical evolution experiments that “organic catalysts that appear in the broth may well lead to the production of molecular species that would normally not be favored under the conditions in the reactor, further enhancing the molecular richness”. If even just some of those species happen to form a closed loop, mutually catalyzing each other, autocatalytic sets would indeed arise spontaneously.
Conceptualization, S.K. and W.H.; software, W.H.; formal analysis, M.S. and W.H.; investigation, W.H.; writing—original draft preparation, W.H.; writing—review and editing, M.S., W.H. and S.K. All authors have read and agreed to the published version of the manuscript.
M.S. thanks the Royal Society Te Apārangi (New Zealand) for funding under the Catalyst Leader programme (agreement no. ILF- UOC1901).
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
All data was generated with custom-made software. Independent verification of our results would be greatly appreciated.
We thank the (anonymous) reviewers for making helpful suggestions to improve this manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
Oparin, A.I. The Origin of Life; Moscow Worker Publishing: Moscow, Russia, 1924. (In Russian) [Google Scholar]
Haldane, J.B.S. The origin of life. Ration. Annu.1929, 148, 3–10. [Google Scholar]
Miller, S.L. Production of amino acids under possible primitive earth conditions. Science1953, 117, 528–529. [Google Scholar] [CrossRef] [PubMed]
Wollrab, E.; Scherer, S.; Aubriet, F.; Carré, V.; Carlomagno, T.; Codutti, L.; Ott, A. Chemical analysis of a “Miller-type” complex prebiotic broth Part I: Chemical diversity, oxygen and nitrogen based polymers. Orig. Life Evol. Biosph.2015, 46, 149–169. [Google Scholar] [CrossRef] [PubMed]
Scherer, S.; Wollrab, E.; Codutti, L.; Carlomagno, T.; Gomes da Costa, S.; Volkmer, A.; Bronja, A.; Schmitz, O.J.; Ott, A. Chemical analysis of a “Miller-type” complex prebiotic broth Part II: Gas, oil, water and the oil/water-interface. Orig. Life Evol. Biosph.2017, 47, 381–403. [Google Scholar] [CrossRef] [PubMed]
Kauffman, S.A. Cellular homeostasis, epigenesis and replication in randomly aggregated macromolecular systems. J. Cybern.1971, 1, 71–96. [Google Scholar] [CrossRef]
Bagley, R.J.; Farmer, J.D. Spontaneous emergence of a metabolism. In Artificial Life II; Langton, C.G., Taylor, C., Farmer, J.D., Rasmussen, S., Eds.; Addison-Wesley: Boston, MA, USA, 1991; pp. 93–140. [Google Scholar]
Bagley, R.J.; Farmer, J.D.; Fontana, W. Evolution of a metabolism. In Artificial Life II; Langton, C.G., Taylor, C., Farmer, J.D., Rasmussen, S., Eds.; Addison-Wesley: Boston, MA, USA, 1991; pp. 141–158. [Google Scholar]
Sievers, D.; von Kiedrowski, G. Self-replication of complementary nucleotide-based oligomers. Nature1994, 369, 221–224. [Google Scholar] [CrossRef] [PubMed]
Ashkenasy, G.; Jegasia, R.; Yadav, M.; Ghadiri, M.R. Design of a directed molecular network. Proc. Natl. Acad. Sci. USA2004, 101, 10872–10877. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely
those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or
the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas,
methods, instructions or products referred to in the content.