1. Introduction
The idea that chemical evolution led to the origin of life was proposed independently by Oparin [
1] and Haldane [
2]. It was later shown to be plausible through the experiments of Miller [
3]. Recently, different variants of these experiments were repeated, and analyzed with state-of-the-art molecular analysis technology [
4,
5]. This revealed the presence of thousands of molecular species, including many classes of organic and catalytic ones. However, it still leaves open the question of how such spontaneous chemical evolution can give rise to a coherent, self-reproducing collective of molecules.
One possible answer to this question was proposed in the form of the emergence of autocatalytic sets [
6,
7]. Informally, an autocatalytic set is a chemical reaction network in which the molecules mutually catalyze each other’s formation, and which is self-sustaining given a basic food source. That such autocatalytic sets can indeed form spontaneously was already shown early on through computer simulations [
8,
9,
10]. Later on, they were also successfully constructed with real molecules in the lab [
11,
12,
13,
14], and shown to exist in the metabolic networks of prokaryotes [
15,
16,
17]. The notion of autocatalytic sets was formalized and studied extensively as reflexively autocatalytic and food-generated (RAF) sets (see, for example, [
18,
19] and references therein).
Recently, a simple model of combinatorial innovation, referred to as the
theory of the adjacent possible (TAP), was studied formally [
20]. In general, TAP states that evolving systems create their own future possibilities in an ever-increasing “adjacent possible”. In particular, the number of “things” that can come into existence (or be created) next increases as a combinatorial function of what is currently in existence. As such, this model can also be interpreted in the context of chemical evolution, where the “things” are molecular species. The more molecular species that are currently in existence, the more potential new species that can come into existence next through (spontaneous) chemical reactions between arbitrary combinations of currently existing molecules. An initial investigation of the combination of the TAP and RAF models was presented in the context of technological evolution [
21]. Here, we present results of a further investigation into this model combination, with both theoretical and computer simulation results, and more specifically in the context of chemical evolution and how it could lead to the formation of mutually catalytic and self-reproducing collectives of molecules as a possible step towards the (or an) origin of life.
3. Results
Consider an instance of the TAP model, described by $({\mathcal{M}}_{t},{\mathcal{R}}_{t})$ (for $t=0,1,2,\dots $) where ${\mathcal{M}}_{t}$ is a set of molecular species generated up to time t and ${\mathcal{R}}_{t}$ is the set of all reactions involved in generating ${\mathcal{M}}_{t}$ starting from ${\mathcal{M}}_{0}$. We let ${M}_{t}=\left|{\mathcal{M}}_{t}\right|$ and ${R}_{t}=\left|{\mathcal{R}}_{t}\right|$ denote the number of molecular species in ${\mathcal{M}}_{t}$ and reactions in ${\mathcal{R}}_{t}$, respectively. Note that these families of sets (and their sizes) are random variables for each $t\ge 1$. We assume throughout this section that ${M}_{0}\ge 1$, and ${\alpha}_{i}\ne 0$ for at least one value of $i\le {M}_{0}$. We also explicitly assume in this section that for each pair $(x,r)$ where $x\in {\mathcal{M}}_{t}\setminus {\mathcal{M}}_{0}$ and $r\in {\mathcal{R}}_{t}$, x catalyzes r with probability p, and that these catalysis events are stochastically independent.
Throughout this section we do not specifically require that ${\alpha}_{i}={\alpha}^{i}$, nor do we place any upper bound (such as $\mathbf{M}$) on ${M}_{t}$ unless otherwise stated, or any bound (such as K) on the number of reactants of a reaction.
Since every reaction in the TAP model creates exactly one new product, we have:
Lemma 1. In the TAP model, ${M}_{t}$ is nondecreasing, and with probability 1, ${M}_{t}\to \infty $ as t grows.
Proof. ${M}_{t}$ is a Markovian random walk on the positive integers, with ${M}_{t+1}\ge {M}_{t}$ for all t, and the probability of the event ${M}_{t+1}-{M}_{t}\ge 1$ is uniformly bounded away from 0 for all values of t. By a standard probability argument, for any positive integer k, the event ${E}_{k}$ that ${M}_{t}\le k$ holds for all t has probability 0. Thus, $\mathbb{P}\left({\cup}_{k\ge 1}{E}_{k}\right)=0$ which ensures that ${M}_{t}\to \infty $ with probability 1. □
Remark 1. If an upper bound $\mathbf{M}$ is imposed on ${M}_{t}$ (so that the process terminates when ${M}_{t}\ge \mathbf{M}$) then Lemma 1 implies that ${M}_{t}$ is certain to eventually hit $\mathbf{M}$. Note also that the version of the TAP model studied theoretically in [20] takes place in continuous (rather than discrete) time, in which case Lemma 1 has a sharper statement: Provided that ${\alpha}_{1}>0$ and ${\alpha}_{i}>0$ for at least one other value of i, then with probability 1, ${M}_{t}$ tends to infinity in finite
time. Here, we are modeling a system in discrete time, and so this “explosion in finite time” phenomenon does not arise. Nevertheless, in our simulation results, where we explicitly stop the process when $\mathbf{M}$ molecular species have been produced, a sudden and rapid increase still occurs. Next, consider the probability that the entire collection of reactions involved in generating ${\mathcal{M}}_{t}$ (i.e., ${\mathcal{R}}_{t}$) is an RAF. As the following lemma shows, this probability depends only on ${\mathcal{M}}_{t}$ only through the size of this set (i.e., ${M}_{t}$), and so we denote the probability by ${P}_{\mathrm{all}}\left({M}_{t}\right)$.
Proof. ${\mathcal{R}}_{t}$ is
F-generated, and so it forms an RAF precisely if each reaction in
${\mathcal{R}}_{t}$ is catalyzed by at least one molecule type in
${\mathcal{M}}_{t}\setminus {\mathcal{M}}_{0}$. By the independence assumption concerning catalysis assignments, the probability that any given reaction
$r\in {\mathcal{R}}_{t}$ is catalyzed by at least one molecular species in
${\mathcal{M}}_{t}\setminus {\mathcal{M}}_{0}$ is
$1-{(1-p)}^{{M}_{t}-{M}_{0}}$, and since there are
${R}_{t}$ reactions, the probability that all reactions in
${\mathcal{R}}_{t}$ are catalyzed is
${\left(\right)}^{1}$ which equals
${\left(\right)}^{1}$ by Equation (
2). □
We can now state our main theorem.
Theorem 1. For the TAP + catalysis model, the following hold.
- (a)
For any value of $p>0$, ${P}_{\mathrm{all}}\left({M}_{t}\right)=1-o\left(1\right)$, where $o\left(1\right)$ is a term that tends to zero as ${M}_{t}\to \infty $.
- (b)
Suppose that ${M}_{t}=m$ at some fixed time $t>0$. Then the following hold:
- (i)
If $p=\frac{ln\left(xm\right)}{m}$ for $xm>1$, thenwhere $o\left(1\right)$ is a term that converges to 0 as $m\to \infty $. - (ii)
The expected number of reactions that each molecule type (not in ${\mathcal{M}}_{0}$) catalyzes (i.e., $f=p{R}_{t}=p({M}_{t}-{M}_{0})$) in order for ${P}_{\mathrm{all}}\left({M}_{t}\right)$ to equal θ is given by:where $o\left(1\right)\to 0$ as $m\to \infty $. In particular, for each such value of θ, f grows logarithmically with m.
Proof. Part (a): Let
${M}_{t}^{\prime}={M}_{t}-{M}_{0}$, and let
$q={(1-p)}^{{M}_{t}^{\prime}}$. By the inequality
$1-x\le {e}^{-x}$ for
$x>0$, we obtain
$q\le {e}^{-p{M}_{t}^{\prime}}$, and thus, by Lemma 2, we obtain:
(since
${(1-x)}^{n}\ge 1-nx$ for
$0<x<1$). Thus,
${P}_{\mathrm{all}}\left({M}_{t}^{\prime}\right)\ge 1-{M}_{t}^{\prime}{e}^{-p{M}_{t}^{\prime}}\to 1$ as
${M}_{t}\to \infty .$ Part (b-i): Conditional on
${M}_{t}=m$, and setting
$p=\frac{ln\left(xm\right)}{m}$, Lemma 2 gives:
where ∼ refers to asymptotic identity as
m grows. Exponentiating gives
${P}_{\mathrm{all}}\left({M}_{t}\right)=exp\left(\right)open="("\; close=")">-\frac{1}{x}$, as required.
Part (b-ii): Conditional on
${M}_{t}=m$, and setting
$p=\frac{ln\left(xm\right)}{m}$ for a value
$x>0$ (to be determined), gives:
From Part (b), and again conditional on
${M}_{t}=m$, the equation
${P}_{\mathrm{all}}\left({M}_{t}\right)=\theta $ gives
$x=\frac{1}{ln(1/\theta )}+o\left(1\right)$ where
$o\left(1\right)$ tends to 0 as
m grows. Noting also that
$p{M}_{0}=o\left(1\right)$, the result now follows from Equation (
4). □
To see how fast
${P}_{\mathrm{all}}\left({M}_{t}\right)$ converges to 1,
Figure 2 shows the theoretical probability of Equation (
3) for an all-
${M}_{t}$ RAF (solid line) against
${M}_{t}$ for a catalysis probability
$p=0.005$. The open circles represent results from the TAP model simulations. The dashed line shows simulation results for any-sized RAF (i.e., an RAF consisting of any number of reactions).
Given a fixed probability of catalysis p, it is clear that once the total number of molecular species ${M}_{t}$ becomes large enough, there is a sharp transition from RAF sets not existing at all to them existing in almost every instance of the model. Of course, RAFs of any size (dotted line) already occur at smaller values of ${M}_{t}$ than all-${M}_{t}$ RAFs, but theoretically it is easier to deal with all-${M}_{t}$ RAFs, as they are always automatically food-generated. Thus, the theoretical expression forms an upper bound on the actual probabilities $P\left({M}_{t}\right)$.
Another way to consider these probabilities is to fix the number of molecular species
${M}_{t}$ and then see what the required level of catalysis
$f=p{M}_{t}$ is to obtain RAF sets with high probability. This level of catalysis indicates the average number of reactions catalyzed per molecule type. To see how this increases with increasing
${M}_{t}$,
Figure 3 shows the theoretical probability (solid lines) of an all-
${M}_{t}$ RAF against
$p{M}_{t}$ for different values of
${M}_{t}$. The dots are values obtained from computer simulations of the TAP model, to again compare with the theoretical results. As expected, the curves move slowly to the right for larger values of
${M}_{t}$, but the distance between each next pair of adjacent curves seems to be decreasing.
Taking the “transition point” to a high probability of RAFs to be at
${P}_{\mathrm{all}}\left({M}_{t}\right)=0.5$, Theorem 1(b-ii) predicts that
$f=p{M}_{t}$ should be close to
$ln\left({M}_{t}\right)+ln(1/ln\left(2\right))\approx ln\left({M}_{t}\right)+0.367$.
Figure 4 shows this function for a range of values of
${M}_{t}$, with the open circles representing results from computer simulations (interpolated from the simulation data shown in
Figure 3). These simulation results closely fit with the theoretically predicted logarithmic curve.
Finally, a comparison is made between the probabilities of any-sized RAFs previously obtained from simulations of the TAP model [
21], and earlier results from a related model known as the binary polymer model [
23]. In this related model, smaller polymers can ligate into larger and larger ones, and larger polymers can cleave into smaller and smaller ones. This model has been investigated extensively in the context of RAF sets in the past [
18,
19].
Figure 5 shows the probability
P of an RAF against the level of catalysis
$f=pR$ for various versions of the binary polymer model that use different ways of assigning catalysis. The red curves are the standard (uniform) catalysis distribution, the blue curves are a power law catalysis distribution, and the green lines are a sparse catalysis distribution [
23]. These results were obtained from computer simulations. The black curves are the theoretically calculated probabilities for an all-or-nothing catalysis distribution. Solid lines are maximum polymer length
$n=10$, while dashed lines are
$n=16$. The thick gray line shows simulation results for the TAP model, with an average number of molecular species of
${M}_{t}=1250$.
Clearly, the required level of catalysis to cause RAF sets to arise is higher in the TAP model (five to seven reactions catalyzed per molecule type, on average) than in the binary polymer model (one to two). However, this can be explained by the fact that in the TAP model there is always only one reaction that produces a given molecular species, whereas in the binary polymer model there are multiple reactions that can produce a given polymer. In other words, there is a large amount of redundancy in the reaction networks resulting from the binary polymer model, allowing for a lower level of catalysis (given that only one or two of the multiple reactions that produce a given polymer need to be catalyzed).
Note also that we restricted the chemical reactions in the TAP model to only generate one product. In general, reactions can produce more than one product, in which case there could be significantly more molecular species than reactions. It is therefore expected that in such a more general model version the required level of catalysis
f will be lower than the five to seven suggested by
Figure 5.
4. Conclusions
We reinterpreted a simple model of combinatorial innovation known as TAP (theory of the adjacent possible) in the context of chemical evolution and autocatalytic sets. We then derived theoretical expressions for the probabilities of such autocatalytic sets arising in instances of the TAP model. These theoretical predictions were verified with results from computer simulations.
These results show that autocatalytic sets do indeed have a high probability of arising in instances of the TAP model, given a large enough number of molecular species and/or level of catalysis. Of course this is still a very general model, as it is assumed that any molecular species can chemically react with any other, and catalysis is assigned randomly. However, previous work on a related model, known as the binary polymer model, showed that more realistic assumptions can be easily incorporated in such a model, and do not change the overall results very much, at least not qualitatively. Moreover, the quantitative changes can often be predicted from the more general basic model version [
18].
Generally, the level of catalysis (i.e., the average number of reactions catalyzed per molecule type) needs to be somewhat higher in the TAP model than in the binary polymer model. However, this can be explained by the redundancy present in reaction networks resulting from the binary polymer model. One could easily imagine a version of the TAP model where certain molecular species can also be produced by multiple reactions.
In conclusion, the results presented here may suggest a possible step towards the (or an) origin of life, where self-sustaining and reproducing autocatalytic sets arise during a process of chemical evolution. In fact, Wollrab et al. [
4] conclude from their “Miller-type” chemical evolution experiments that “organic catalysts that appear in the broth may well lead to the production of molecular species that would normally not be favored under the conditions in the reactor, further enhancing the molecular richness”. If even just some of those species happen to form a closed loop, mutually catalyzing each other, autocatalytic sets would indeed arise spontaneously.