Multistable Protocells Can Aid the Evolution of Prebiotic Autocatalytic Sets

We present a simple mathematical model that captures the evolutionary capabilities of a prebiotic compartment or protocell. In the model, the protocell contains an autocatalytic set whose chemical dynamics is coupled to the growth–division dynamics of the compartment. Bistability in the dynamics of the autocatalytic set results in a protocell that can exist with two distinct growth rates. Stochasticity in chemical reactions plays the role of mutations and causes transitions from one growth regime to another. We show that the system exhibits ‘natural selection’, where a ‘mutant’ protocell in which the autocatalytic set is active arises by chance in a population of inactive protocells, and then takes over the population because of its higher growth rate or ‘fitness’. The work integrates three levels of dynamics: intracellular chemical, single protocell, and population (or ecosystem) of protocells.


Introduction
The simplest life forms existing today and plausibly existing at the origin of life are such complex chemical organizations involving small and large molecules, that it is virtually impossible to imagine their origin except through some process of chemical evolution [1,2].Imagining plausible steps in chemical evolution that resulted in the increase of complexity of prebiotic chemical organization is therefore an important task.
One significant set of prebiotic scenarios is based on the idea of an autocatalytic set (ACS) of chemical reactions [3][4][5], reviewed in [6,7].Here we are concerned about the evolution of ACSs.This has been investigated [8][9][10] (for reviews, see [11,12]) largely in the context of ACSs that reside in static well stirred containers.It is recognized that at some stage autocatalytic networks must have evolved inside a spatial compartment or 'protocell' which propagated through growth and division.Consequently, different models of protocells containing ACSs have been proposed [13][14][15][16][17][18][19][20][21][22] where the compartments are modeled after micelles (autocatalytic aggregates of lipid catalysts), vesicles (lipid bilayers permeable only to food molecules enclosing an aqueous environment containing the ACS) or other structures.
These models have considered how the features of Darwinian evolution [23,24], namely, (i) heredity, (ii) heritable variation, and (iii) differential fitness of the variants, can arise in such protocells.In models of growing-dividing protocells that contain ACSs, daughter protocells inherit the composition of the mother, and this transmission of compositional information is the mechanism of heredity [10,25,26] instead of template replication of an information carrying molecule.The interesting property of 'synchronization' has been shown to arise fairly generically in these models [18,27] whereby the composition of the protocell at successive divisions remains the same, giving the lineage of protocells a stable compositional identity.As a source of variation needed for evolution, models have considered chemical fluctuations due to the chance occurrence of rare reactions which are enhanced in small volumes, or changes in the environment (e.g., addition or removal of molecular species from the food set) [20,[28][29][30].A large network containing multiple ACSs [31,32] causes protocells that contain distinct ACSs to grow with different rates [10,26].This can give rise to differential fitness of protocells.Notwithstanding all the above work, a crisp and convincing theoretical demonstration of the Darwinian evolution of a population of ACS containing protocells remains an unfinished task [12].In this paper we present a new model which explicitly demonstrates the evolution of a population of such protocells in the Darwinian sense (albeit only one step of evolution due to the simplicity of the model).Our work makes use of an interesting feature of certain autocatalytic network topologies: the presence of multi-stability in the dynamics [33][34][35][36][37].Our protocell has just two stable states, one in which no ACS is present (inactive state) and the other in which it is (active state).The protocell has a higher growth rate in the active state compared to the inactive state.The variation in a protocell is just the spontaneous transition, due to chemical fluctuation in a small volume, from the inactive to the active state without any change of environment.The evolution exhibited is the establishment, growth and dominance of the active protocells in a population of protocells.The simplicity of the model allows us to quantify the conditions under which this 'natural selection' can take place, in terms of the various dynamically generated timescales of the model.In future work we hope to generalize this to multiple evolutionary steps of increasing complexity.

The model
The protocell consists of three molecular species, a monomer A(1) (food molecule), a dimer A(2) (assumed to be the enclosure forming molecule) and tetramer A(4) (catalyst); see Fig. 1.The population of A(i) (i = 1, 2, 4) in the protocell is denoted X i ; x i ≡ X i /V is its concentration, where V is the volume of the protocell.The set of reactions these molecules can undergo are: A(1) ext denotes the monomer species outside the cell; its concentration is assumed constant.The membrane formed by the dimers is permeable only to monomers; the rate at which monomers come in is proportional to the number of dimers, α being the proportionality constant.Two monomers can spontaneously ligate to form a dimer and two dimers to form a tetramer, both with the same rate constant k F .The reverse (dissociation) reactions have a spontaneous rate constant k R .These ligation-dissociation reactions are also catalyzed by the tetramer, whose 'catalytic efficiency' is denoted κ (this effectively means that the catalyzed reaction rate is κx 4 times the spontaneous rate).The dimer and tetramer are assumed to degrade with rate constant ϕ into a waste product that quickly diffuses out of the protocell.Note that the catalyzed reactions R1 and R2 together with the transport reaction form an ACS starting from the food set A(1) ext .
In this model the dimer does double duty as both the enclosure forming molecule as well as a reactant for catalyst production.In the equations below, we do not introduce separate population variables for the two roles.This is purely for simplicity and is not a crucial assumption.In the Supplementary Material Section 1 we show that in a model with two monomer species in which these two functions are performed by distinct molecules, similar results arise.
Using mass action kinetics, the deterministic rate equations of the model are given by The V /V terms represent dilution in an expanding volume.Note that when V is not constant, Eqs.(1-3) do not specify the dynamics completely unless the growth rate V /V is specified.Since here we want an endogenous growth rate, we do not specify V /V exogenously.Instead, we write the model in terms of the populations, and assume a certain functional form for V in terms of the populations.In terms of X i , the above equations reduce to For simplicity we take V to be a linear function of the populations X = (X 1 , X 2 , X 4 ): where v is a constant.This choice gives the protocell a constant mass density (as observed in bacterial cells [38]) since V is proportional to the mass of the protocell.This choice is not essential; we have tried other linear functions . The quantitative results depend on the values of v i but the qualitative features presented below hold for all the cases considered.We have also considered other versions of the model with the transport term αX 2 in 5 modified to a gradient term αX 2 (x 1,ext − x 1 ) (where x 1,ext is the constant concentration of A(1) ext ), certain other autocatalytic reaction topologies, etc. (see Supplementary Material Section 1).The qualitative conclusions seem to be robust to these choices.Without loss of generality, the constants k R and v are set to unity by rescaling , which makes time and the other parameters dimensionless.The definition of V (X) and the values of the rescaled parameters k F , ϕ, α, κ completely define Eqs.(5-7), and one can solve for X(t) given any initial condition.In a particular trajectory V may increase or decrease.Protocells larger than a characteristic size may become floppy or unstable and spontaneously break up into smaller entities.We assume that if V increases to a critical value V c the cell divides into two identical daughters each containing half of the three chemicals of the mother protocell at division.The dynamics of a daughter after division is again governed by Eqs.(5)(6)(7)(8).This division rule and Eqs.
(5-8) together completely define the model at the deterministic level.
The dynamics of the ACS consisting of the catalyzed reactions R1 and R2 in a fixed size container but with buffered A(1) as the food set is given by Eqs.(6-7) with V and X 1 constant.This was studied in [36] at the deterministic level where a bistability was observed, and in [37] at the stochastic level where transitions between the attractors was observed.The present model by adding Eqs. ( 5), (8) and the division rule embeds the ACS in a growing-dividing protocell instead of a fixed volume container.It shares the bistability of the fixed volume version, but also possesses qualitatively new properties.These properties (considered along with stochastic dynamics) enable a population of such protocells to mimic (one step of) Darwinian evolution, as will be discussed below.

Deterministic dynamics: Bistability with two distinct growth rates
Since V is a linear function of the populations, V /V can be expressed in terms of the concentrations.Differentiating Eq. (8) w.r.t.t and using Eqs.(5)(6)(7), it follows that Eqn. (9) expresses the instantaneous growth rate of the protocell in terms of its chemical composition, a feature that is missing from previous protocell models.When Eq. ( 9) is substituted in Eqs.(1-3), the concentration dynamics also becomes completely defined.It has fixed points.Fig. 2 shows a bifurcation diagram in which the fixed point concentration of A( 4) is plotted by varying the parameter κ.The model exhibits bistability for κ I < κ < κ II .Note that the catalyst concentration x 4 in the upper stable branch is two orders of magnitude higher than in the lower stable branch.On the lower branch the rates of catalyzed reactions are smaller than the corresponding spontaneous reactions, while on the upper branch they are much higher.We therefore refer to the upper branch as one in which the ACS is active and the lower branch as ACS inactive.Depending on the initial condition, for a given κ in the bistable region, the dynamics will settle into either of the two stable attractors as shown in Fig 3A for one such κ.For κ < κ I there is only one attractor (the inactive one), and for κ > κ II also only one attractor (the active one).
For each fixed point attractor, the r.h.s. of Eq. ( 9) is constant.Hence in the attractor, V grows exponentially, V (t) = V (0)e µt with constant µ.In other words the protocell has a characteristic growth rate in each attractor given by the expression in Eq (9).This is shown in the inset of Fig. 2. Hence in the bistable region, the protocell can grow with two distinct growth rates depending upon which attractor it is in.The growth rate is many times higher in the active state than in the inactive one.
Once the concentrations have reached their fixed point attractor, (9) implies that V grows exponentially, and Eq. ( 8) then implies that each chemical population must also grow exponentially with the same rate µ.(Only if all populations grow at the same rate as V will their concentrations be constant.)Thus in each attractor we have X i (t) = X i (0)e µt .In other words, the protocell naturally exhibits balanced growth in each attractor (growth with ratios of all populations constant [39]).Exponentially growing trajectories in a nonlinear system and this remarkable emergent coordination between the chemicals 5/28 without any explicit regulatory mechanism is a consequence of (a) the fact that the r.h.s. of Eqs.(5-7) are homogeneous degree one functions of the populations (if all three populations are simultaneously scaled by a factor β, X i → βX i , then the r.h.s. of Eqs.(5-7) also scales by the same factor β), and (b) that the ACS structure couples all chemicals to each other.This is discussed in detail in ref. [40] in the context of models of bacterial physiology.
Fig. 3B shows, for a protocell, the trajectories of its chemical populations and volume as functions of time for two very close initial conditions (defined by the population of species A(1), A(2) and A( 4)) that lie in different attractor basins.They converge to different attractors: ACS-active (upper panel) and inactive (lower panel).After a protocell divides we track one of its daughters.The attractor is a fixed point for concentrations (Fig. 3A) but a limit cycle for populations and the volume (Fig. 3B).The growth phase of the limit cycle has the same constant slope for all populations in a given attractor, signifying exponential growth with the same growth rate for all chemicals in the attractor.The slope is larger (and interdivision time shorter) for the active attractor.At division, since populations and the volume both halve, concentrations do not see any discontinuity.
The existence of bistability is robust in parameter space.It may be noted that a nonzero degradation rate ϕ of the dimer and tetramer is essential for bistability (as also found in the model studied in ref. [36]).A degradation term ϕ ′ x 1 for the monomer can also be introduced in Eq. ( 1); however it is found that ϕ ′ 6/28 must be sufficiently smaller than ϕ for bistability to exist.

Stochastic dynamics of a single protocell: transitions between states of different growth rates
We now consider the protocell under the stochastic chemical dynamics framework.The chemical populations are now non-negative integers and each unidirectional reaction occurs with a probability that depends on the populations of the reactants and the values of the rate constants.We simulate the stochastic chemical dynamics of the protocell using the Gillespie algorithm [41].The reaction probabilities are listed in the Appendix A. Whenever a reaction occurs the populations of its reactants and products are updated.For large populations when fluctuations are ignored, the above mentioned probabilities lead to the deterministic Eqs.(1-3) or (5-7).In using the Gillespie algorithm for expanding volumes the rate of increase of volume needs to be taken into account [42,43].In the present work since volume is treated as a function of populations (8), we assume that it is instantaneously updated when the populations are.Fig. 4 shows a simulation run of the stochastic chemical dynamics of a single growing and dividing protocell.At the volume threshold V c , when the protocell divides into two daughter protocells, we implement partitioning stochasticity, namely, each molecule in the mother is given equal probability of going into either daughter.In Fig. 4, at each division we randomly discard one of the two daughters and choose one for further tracking, in order to display a single-cell trajectory over several divisions (effectively it is the trajectory of a single lineage of protocells).
Starting from the initial condition shown where the protocell is composed of only A(1) and A(2), the protocell initially grows and divides in the inactive state.The first A(4) molecule is produced by the chance occurrence of the uncatalyzed reaction 2. Production of a sufficient number of A(4) molecules triggers a transition to the active state, where the population of A( 4) is significantly larger than in the inactive state.As in Fig. 3 for the deterministic case, so also in Fig. 4 it can be seen that the protocell in the active state grows and divides faster than in the inactive state.However, unlike the deterministic case, we also see transitions between the inactive and active states.These transitions occur because for a small protocell (500 ≤ V ≤ 1000 for the protocell in Fig. 4), chance production or depletion of a few molecules of A( 4) is enough to push its concentration into the basin of the other attractor.Note that in Fig. 4  Note that typically a daughter naturally inherits the state of the mother protocell: since the two daughters have roughly half the number of molecules of each type as the mother, and hence also half the volume, they have the same concentration of each chemical as the mother.Partitioning stochasticity occasionally results in a daughter losing the mother's state.

Protocell population dynamics: Dominance of the autocatalytic state
Fig. 5 shows the time evolution of a population of such protocells.At t = 0 we start from a single protocell in the inactive state, whose dynamics was shown in Fig. 4.However, in this simulation, when a protocell divides, instead of discarding a daughter, we keep it in the simulation until the total population of protocells reaches an externally imposed ceiling K.After the total number of cells reaches K, the total population is kept constant.This done by removing one randomly chosen protocell from among the K + 1 protocells whenever any protocell divides.Each protocell in the population is independently simulated by the single cell stochastic dynamics (Gillespie algorithm).Fig. 5 tracks only the number of protocells in each state (active or inactive) as a function of time.
The number of protocells in the inactive state increases whenever one of them divides.Eventually one of them makes a stochastic transition to the active state, whereupon the number of active protocells jumps from zero to one.Active protocells also make stochastic transitions to the non-active state on a certain time scale.However, since active protocells divide faster (as seen in Fig. 4), their number grows faster and their population catches up and overtakes the inactive population in Fig. 5. Eventually the active protocells dominate the population.The curves in Fig. 5 represent the net result of stochastic transitions and proliferation by division.The fraction of protocells in each state is expected to reach a stochastic steady state (see below) that represents a balance between proliferation and transition.In the simulations we find that the fraction of inactive cells declines when the total population hits K (see Fig. 5).It eventually reaches its steady state fraction.A decline is seen at the time the total population hits K, because in this simulation at that time the fraction of inactive cells is higher than its steady state fraction (this is a consequence of the initial condition, the fact that at t = 0 we started from a single protocell in the inactive state).In Supplementary Material Section 2 a similar qualitative behaviour can be seen for other values of κ within the bistability region.
An approximate (mean field) model of the protocell population dynamics (valid for large populations) with no ceiling (K → ∞) is the following: where n 1 (n 2 ) is the population of protocells in the inactive (active) state, µ 1 = ln 2 ⟨τ1⟩ and µ 2 = ln 2 ⟨τ2⟩ are the average growth rates of the protocell in the inactive and active states respectively, and ⟨T2⟩ are the transition rates, respectively, from the inactive to active and active to inactive states.This is a linear dynamical system dn dt = An, where n = (n 1 n 2 ) T is the column vector of protocell populations, and Eqns. (10)(11) for the populations of inactive and active protocells are identical to the model used to describe the populations of persister and normal cells of bacteria [44].
The steady state fraction f of active protocells in the population f ≡ n 2 /(n 1 + n 2 ) can be computed 8/28 from the eigenvector of A corresponding to its largest eigenvalue, e 1 .The result is: where A calculation of f for a finite but large ceiling K is given in the Appendix C.1 and yields the same answer as (13), independent of K.
Using the averages given in the caption of Fig. 4 to determine the components of A, this calculation yields f = 0.925 ± 0.014 (mean ± standard error), with the error arising from the finite sample estimation of the averages.This agrees with the fraction found (over long times) in the stochastic steady state of the simulation of Fig. 5, namely 0.937 ± 0.019 (mean ± standard deviation).The Supplementary Material Section 3 shows the agreement between simulations and the mean field model at other values of κ.
Note in Fig. 5 that even though the active protocells have a higher growth rate than the inactive, a finite fraction of the inactive still survives in the steady state.This is because of the nonzero transition probability λ 2 from the active to the inactive state.If λ 2 had been zero, the eigenvector of A corresponding to its largest eigenvalue would have been (0 1) T implying that the inactive state is extinct in the steady state.When λ 2 ̸ = 0, one can show (see Appendix C.2) that if then f ≃ 1 − λ2 µ2−µ1+λ1 is close to unity.The quantity 1/(µ 2 − µ 1 + λ 1 ) defines a time scale of the single protocell dynamics.The above condition means that if the average lifetime ⟨T 2 ⟩ (= 1 λ2 ) of the active state is much larger than this time scale, ACS active protocells will come to dominate the population.Another way of writing this condition is µ Therefore a sufficient condition for active protocells to dominate is that the active protocell divides many times in its typical lifetime (µ 2 ⟨T 2 ⟩ ≫ 1) and grows much faster than the inactive protocell (µ 2 ≫ µ 1 ).
Note also that a nonzero λ 1 is what ensures that even if we start with a zero population of active protocells, one active protocell will sooner or later be produced by chance, leading eventually to a fraction f of active protocells.

Discussion
In this work we have constructed an example that shows (i) how autocatalytic sets of reactions inside protocells can spontaneously boost themselves into saliency and enhance the populations of their product molecules including catalysts, and (ii) how such protocells (where the ACS is active) can come to dominate in a population of protocells.Encasement within protocells serves two important functions.(i) The small size of a protocell allows a small number fluctuation of the catalyst molecules to take their concentration past the basin boundary of the attractor in which the ACS is inactive into the basin of the active attractor, thereby causing the protocell to transition from an inactive to active state.A large container would require a larger number fluctuation to achieve the same transition, which is more unlikely.(ii) Protocells in the active state grow at a faster rate than the inactive state, thereby eventually dominating in population.The differential growth rate is a consequence of the fact that the protocell size depends upon its internal chemical populations, a possibility that is precluded when we discuss chemical dynamics in a fixed size container.Therefore, in this example, protocells aid both the generation and the amplification of autocatalytic sets.
The differential growth rates of the two states are not posited exogenously, but arise endogenously within the model from the underlying chemical dynamics defined by Eqs (5-8) (and their stochastic version).The additional assumption made is that upon reaching a critical size a protocell divides into two daughters that share its contents.This property can arise naturally due to some physical instability.Collectively these assumptions lead to the properties of heredity, heritable variation (the variation is heritable because once the fluctuation pushes it into a new basin of attraction a protocell typically descends into its new attractor in a short time), and differential fitness in a purely physico-chemical system.This leads to the dynamics of the two subpopulations of protocells shown in Fig 5 which is similar to that of natural selection.(A difference is that the slower growing subpopulation never goes completely extinct, due to the non-zero probability of transition of a faster growing protocell into a slower growing one.) The process of going from an initial state with no ACS to its establishment in a population of protocells, discussed here, might be considered the first step in the evolution of the ACS.One might wonder how the ACS would evolve further from there.It has been shown that chemistries containing ACSs exhibit multistability in fixed sized containers.In some of these chemistries simpler ACSs involving small catalyst molecules are nested inside more complex ACSs having larger and more efficient catalyst molecules [36].The multiple attractor states correspond to ACSs with progressively larger molecules and higher level of complexity being active.It is possible that by embedding such chemistries within protocells, the mechanism discussed here could allow one to realize a punctuated evolutionary path through sequentially more complex ACS attractors to a state of high chemical complexity from an initial state that only contains small molecules and no ACS.This is a task for the future.
The specific artificial chemistry and protocell properties studied here are highly idealized ones.The object was to demonstrate a mechanism in principle.However, we believe the mechanism is quite general and it should be possible to demonstrate it in other models (e.g., [10,26]) provided multistability in a fixed environment and the emergence of distinct timescales as discussed in the present work can be established.We remark that though we have been primarily thinking of protocells as vesicles (motivated by similar models of bacterial physiology), some of our methods might also be useful in the context of micelles.Recently Kahana et al [30] presented a model of the stochastic dynamics of lipid micelles which had multiple attractors corresponding to distinct composomes.It would be interesting to compare the growth rates of micelles in different attractors as well as the transition rates between the attractors in their model.
We note that there have been independent experimental developments in constructing bistable autocatalytic chemistries [45] and self-replicating protocells [21].It is also established that small peptides exhibit catalytic properties [46] and they can be encapsulated within protocells to promote protocellular growth [47].A recent paper also shows the coupling of a simple autocatalytic reaction with the compartment growth and division [48].A synthesis of these approaches might result in the experimental realization of the mechanism described in the present work.
We have considered dynamics at three levels: One is the chemical dynamics of molecules within a single protocell.This depends upon molecular parameters such as rate constants, efficiency of the catalyst molecule, etc. From this we extracted effective parameters at the second level: that of a single protocell (growth rates of the two protocell states, residence times, etc.).These were then used to derive the dynamics at the third level consisting of the population of protocells.This enabled an understanding of the conditions under which active protocells would dominate.Such an approach might be useful in other settings, for example in understanding certain aspects of bacterial ecology from molecular models of single bacterial cells.sequence of ones and zeros.A contiguous subsequence consisting of only ones bordered by zeros (only zeros bordered by ones) at both ends of the subsequence was declared to be an instance of residence in the active state (inactive state).The duration of such a subsequence (equal to the difference between the ending and starting times of the subsequence as measured by the corresponding division times) was taken to be the lifetime of the state.Within an active or inactive subsequence, the difference between two consecutive division times was taken to be an instance of an interdivision time in that state.
Fig 6 shows the histograms for the residence times and interdivision times in the active and inactive states using the above definitions, for one set of parameter values.

Appendix C The steady state fraction of ACS Active protocells (f ) in the protocell population
An expression was derived for the asymptotic fraction of active protocells in the protocell population dynamics (Eq.( 13) of the main paper).The derivation used mean field equations for the populations of the active and inactive protocells and assumed indefinite growth of the two populations.Here we show that the same expression follows if we truncate the total population of protocells at a large ceiling K.We analyze the conditions under which this fraction is close to unity.We also present numerical evidence that the fraction so obtained agrees with the actual stochastic simulations of protocell population dynamics at different values of κ.
C.1 Calculation of f for a system with finite ceiling K on the total population In our stochastic simulations of protocell population dynamics, the total number of protocells increases until it reaches the ceiling K.After that it becomes constant because whenever a protocell divides one protocell chosen at random is removed from the population.Consider the dynamics of n 1 and n 2 (populations of the inactive and active protocells respectively) after the total population n 1 + n 2 has reached this constant value K.If K is sufficiently large, we can use the same equations as before (namely, Eqs. ( 10) and ( 11) of the main text) modified by the addition of a death term on the right hand side.In other words, where the last term in both equations accounts for the removal of active or inactive protocells in proportion to their existing population (the average effect of the random removal of a protocell from the population).
β is chosen so that the total population is constant, i.e., n 1 + n 2 = K.Then, using ṅ1 + ṅ2 = 0, we get Eliminating n 1 = K − n 2 from the ṅ2 equation, and setting ṅ2 = 0 to obtain a fixed point, we obtain a quadratic equation for the fixed-point value of n 2 : This has the solution When µ 2 − µ 1 is positive (as is the case in our simulations), the positive root must be chosen to get a physical solution (non-negative value of n 2 ).This yields The expression of f is independent of K.A bit of algebra shows that this expression is identical to that in Eq. ( 13) of the main paper.

C.2 Condition for active protocells to dominate the population
The above expression for f can be written as where s = λ1+λ2 µ2−µ1 and d = λ1−λ2 µ2−µ1 .This shows that f only depends upon the two dimensionless combinations s and d of the four parameters.
From the above expression it immediately follows that f = 1 when λ 2 = 0, as already mentioned in the main text.We can also ask: How small should λ 2 be for f to be close to unity?To see this it is useful to introduce the combinations z = λ1 µ2−µ1 and x = λ2 µ2−µ1 .Then When x z+1 ≪ 1, the second term inside the square root is much smaller than unity.Performing a Taylor expansion, we get f ≃ 1 − x z+1 to leading order in x z+1 .This shows that the condition for the active protocells to dominate in the steady state of the population dynamics is We remark that as a special case if x ≡ λ2 µ2−µ1 ≪ 1, then the above condition will hold (since λ 1 ≥ 0), and active protocells will dominate.However the inequality (9) gives a more general condition for active protocell domination.
In this section we present results for a protocell model with five chemical species that relaxes certain constraints and assumptions of the model presented in the main paper in order to show the robustness of the results of the main paper.The five species include two monomers A(1) and B(1), two dimers A(2) and B(2), and one tetramer A(4).Their respective populations in the protocell are denoted X 1 , Y 1 , X 2 , Y 2 and X 4 .The main differences are as follows: • The rate of intake of food molecules is proportional to their difference in concentration between the outside and inside of the protocell.
• There are two types of monomers, A(1) and B(1), both treated as food molecules, instead of just one.
• In the model presented in the main paper, the dimer A(2) was doing double duty as the enclosure forming molecule as well as a reactant to form the catalyst A(4).Here the two roles are performed by different molecules, the enclosure forming molecule being the dimer B(2).
• The definition of the protocell volume excludes the population of the enclosure forming molecule (only includes populations of molecules in the bulk of the protocell), as an example of an alternate linear combination of chemical populations.
While the quantitative outcomes depend upon the details, the qualitative results remain the same.These include the presence of bistability in a robust parameter region, two distinct growth rates for the two attractors, and selection of the state where the ACS is active.The reaction scheme is as follows: ϕ −→ ∅.
In this model, the enclosure is formed by the dimers of the type B(2) and is permeable only to the monomers A(1) and B (1).The rates at which monomers diffuse into the interior of the protocell is taken to be proportional to the number of B(2) and the difference in the monomer concentrations inside and outside, α being the proportionality constant.The three catalyzed reactions R1, R2, R3, all catalyzed In the upper branch (ACS active) the catalyst has a concentration that is about three orders of magnitude higher than the lower branch (ACS inactive).The inset shows that the growth rate of the protocell in the ACS active state is about one order of magnitude higher than in the ACS inactive state.As κ increases within the bistable region, the lifetime of the inactive state decreases and that of the active state increases.This is expected since the basin size of the inactive attractor declines and that of the active attractor grows as κ increases from κ I to κ II (see, e.g., the difference between the unstable branch and the two stable branches in Fig. 2 of the main paper).This increases the steady state fraction of the active protocells in the dynamics of protocell populations.However the qualitative behaviour of the model is unchanged.
For calculating f from the mean field model, the parameters µ 1 , µ 2 , λ 1 , and λ 2 are estimated as discussed in the main paper as well as in Appendix B of the main paper.f sim data was generated from stochastic simulations of protocell dynamics in which the ceiling of the total number of protocells was taken to be K = 250.

Comparison of f from mean field model with f obtained by simulations
Table 2 and Fig. 12 of the supplementary material compare the value of f obtained from the analytic expression given in Eq. ( 13) of main paper and in Appendix C with the value in stochastic simulations of the protocell dynamics discussed in the main paper (denoted f sim ), at different values of the catalytic efficiency κ.

26/28
The analytic value of f obtained from the mean field model agrees with f sim within error bars.Note that the individual parameters µ 1 , µ 2 , λ 1 , and λ 2 in the analytic expression for f are obtained from average values of τ 1 , τ 2 , T 1 and T 2 calculated from the respective histograms (such as those displayed in Fig. A1 of Appendix B in the main paper) generated from the stochastic simulation of a single growing and dividing protocell.Therefore, all of the parameters have errors (given in Table 2) arising from the standard errors of the means.The error ∆f in f is computed from the analytical expression of f using the above mentioned standard errors in each of the four quantities.The error ∆f sim in f sim is just the standard deviation of f sim in the stochastic steady state.
4 Data structure and cleaning methodology Data analysis was primarily performed using data generated from two stochastic simulations: Stochastic single cell growth-division and Population of protocells.

Stochastic single cell growth-division
Following is the format of data prepared for analysing various aspects of the model: 1. Raw Data Level 0: Raw data is first stored in the structure given in Table 3.The raw data consists of the copy number of species (X i ), time at which the reaction occurred, volume of the protocell, and the generation at which the cell is when the internal reactions are happening.The generation count 1 is set to 0 at the start of the simulation.
Generation count Time of reaction V X 1 X 2 X 4 Table 3. Representation of data generated for Raw Data Level 0.
2. Raw Data Level 1: From the Raw Data Level 0, data ONLY at the time of division is extracted and stored separately in the format given in Table 4.This data has the values of species copy number and the state (in terms of binary string '0' for inactive or '1' for active) of the mother protocell at the time of division.
Generation count State (0/1) Time at division V X 1 X 2 X 4 Table 4. Representation of data generated for Raw Data Level 1.
Note that the time is recorded only at the point when the cell divides.This time marks the end of previous generation or start of the next generation.A transition can occur at any point within a single cell cycle.However, the state of the cell (0 or 1) is noted only at the time of division in the data above.The state of the cell is decided as per the criteria defined in Appendix B in the paper.Transient from this data is removed as per the guidlines given next.

Removing the transient: The transient trajectory of the cell is defined as the initial phase
where the concentrations of the species inside the cell have not reached a stochastic steady state.
To generate the data set to extract the parameters of the mean field model (residence times and interdivision times) for the two states from a run, the transient in the beginning of the run needs to be removed from data stored in Raw Data Level 1.It is typically observed that the stochastic steady state is reached within the first few division cycles.After 15 division cycles we ask: Has a transition occurred yet?There could be two possibilities.
1 Generation count is defined as the number of divisions the cell has undergone during the course of the simulation.

27/28
State (0/1) Time of reaction V X 1 X 2 X 4 Table 5. Representation of data generated for a particular cell in the protocell population simulation.
(a) No transition has taken place in the first 15 division cycles: Then the point where the first transition occurs is the start of data recording.This point is the first instance where the cell has changed its state (from 0 to 1 or 1 to 0).
(b) A transition has taken place in the first 15 division cycles: In this case, the first transition is ignored.Data collection starts from the second transition point irrespective of whether it is within the first 15 division cycles or not.

4.
Truncating data collection: Data collection stops at the last transition point in the run.This point is the last instance where the cell changes its state (from 0 to 1 or 1 to 0).
The Level 1 data modified by the removal of the initial transient and truncation of the end of the run is stored separately in same format as given in Table 4, and used to construct the histograms of single protocell parameters τ 1 , τ 2 , T 1 , T 2 .

Population of protocells
In this simulation, data is generated in two formats: 1. Data of individual protocell: Each time a protocell divides, a new file is generated storing the data of one of the daughter protocells starting with the population of chemical species at birth in the format shown in Table 5 while the data of the other daughter cell is appended to the mother cell file.Each time a reaction occurs in any cell, the revised species count (X i ) is appended to the file corresponding to that cell.The total number of files is equal to the number of division cycles plus number of cells the simulation began with2 .
2. Summary Data: A summary file is created that stores the number of active/inactive protocells at every division by extracting data of all the existing protocells at the division time points.This information is stored in the format given in Table 6.This data is used to generate Fig. 5 of the main paper.
Total no. of cells Division Time No. of Active cells No. of Inactive Cells Table 6.Format of the data stored in file containing the number of active/inactive protocells in the population.

Figure 1 .
Figure 1.An illustration of a protocell inside an aqueous medium buffered with monomeric food molecules, A(1)ext.The protocell membrane is composed of dimer molecules A(2).

Figure 2 .
Figure 2. Bifurcation diagram for the model: Steady state concentration, x4, of the catalyst versus catalytic efficiency, κ.The region between κ I (= 1840) and κ II (= 3580) is the region having three fixed points, two of which are stable (solid black curves) and one is unstable (red dotted curve).Inset: Growth rate, µ, of the protocell, versus κ.Parameters: Hereafter, kR and v have been set to unity without loss of generality after non-dimensionalizing the model.kF = 1, ϕ = 20, α = 100.

Figure 3 .
Deterministic trajectories in the bistable region of the model.κ = 2400, other parameters are as in Fig. 2. A: Phase portrait projected onto the x 2 − x 4 plane.Several trajectories starting with different initial conditions are shown; they reach one of two stable fixed points denoted by blue closed dots.All the solid curve trajectories end at the stable fixed point on the top right (ACS active) while all the dotted trajectories end at the stable fixed point on bottom left of the plot (ACS inactive).The red open dot represents an unstable fixed point.The dashed curve is a schematic of the basin boundary between the two stable fixed point attractors.B: Deterministic trajectories of populations (in log scale) of species A(1), A(2), A(4) and the protocell volume as functions of time for two initial conditions.V c = 1000.Initial conditions: IC1 (lower panel; dotted curves): X 1 = 952, X 2 = 20, X 4 = 2. IC2 (upper panel; solid curves): X 1 = 944, X 2 = 20, X 4 = 4. Protocell starting with IC1 ends up in the inactive state in which the population of the catalyst A(4) is less than one as seen in dotted red curve in the lower panel.Protocell starting with IC2 ends up in the active state in which the population of the catalyst is high (approximately between 10 and 20).The interdivision times in the inactive and active states are, respectively, τ 1 = 0.269, τ 2 = 0.075.
the protocell lineage spends more time in the inactive state than the active.The residence times of a protocell lineage in the two attractors (T 1 , T 2 ) have distributions (see Fig 6 in Appendix B) that vary with parameters.

2 Figure 4 .
Figure 4. Stochastic simulation of the populations of species A(1), A(2) and A(4) for a single protocell lineage in the model.Parameter values are as in Fig. 3, V c = 1000.Initial condition: X 1 = 480, X 2 = 10, X 4 = 0. Note the transitions of the protocell between the inactive and active states.From a long such simulation we find that the average interdivision times in the inactive and active states are, respectively, ⟨τ 1 ⟩ = 0.295, ⟨τ 2 ⟩ = 0.077, while the average residence times in the two states are ⟨T 1 ⟩ = 3.413, ⟨T 2 ⟩ = 1.916.

Figure 5 .
Figure 5.Time evolution of a population of protocells starting from a single protocell in the inactive state.Shown is the number of protocells in the inactive state (green), active state (orange), and their sum (blue).As inactive protocells grow and divide, their population increases.The orange curve departs from zero when one of the inactive protocells makes a stochastic transition to the active state.The two populations have different growth rates.After the total population reaches an externally imposed ceiling K (=100 in this figure), upon each cell division a randomly chosen protocell is removed from the population.The population eventually settles down in a stochastic steady state dominated by the active protocells.This is the natural selection of an autocatalytic state.Parameter values are as in Fig.4,K = 100.

Figure 7 . 1 = y ext 1 = 1 .
Figure 7. Bifurcation diagram for the 5-chemical-species model.Parameters:k F = k R = v = 1, ϕ = 20, α = 100.External concentrations: x ext 1 = y ext 1 = 1.In the upper branch (ACS active) the catalyst has a concentration that is about three orders of magnitude higher than the lower branch (ACS inactive).The inset shows that the growth rate of the protocell in the ACS active state is about one order of magnitude higher than in the ACS inactive state.

Figure 8 .
Figure 8. Stochastic simulation of population of species A(1), B(1), A(2), B(2) and A(4) for the model using the Gillespie algorithm.Each reaction has a probability of occurrence per unit time that is related to the deterministic reaction rate along the same lines as given in Table A1 of Appendix A for the model in the main paper.Parameter values: κ = 15000; rest same as in Fig. 7 of Supplementary Material.Initial condition: X1 = Y1 = 200, X2 = Y2 = 10, X4 = 0. From a long such simulation we find that the average interdivision times in the inactive and active states are, respectively, ⟨τ1⟩ = 2.768, ⟨τ2⟩ = 0.164, while the average residence times in the two states are ⟨T1⟩ = 20.728,⟨T2⟩ = 3.058.

Figure 9 . 28 2
Figure 9.Time evolution of a population of protocells in the 5-chemical-species model starting from a single protocell in the inactive state.Each individual protocell is simulated by the Gillespie algorithm for its internal chemical dynamics.Shown is the number of protocells in the inactive state (green), active state (orange), and their sum (blue).After the total population reaches an externally imposed ceiling (100 in this figure), upon each further cell division a randomly chosen protocell is removed from the population.Parameters: κ = 15000, kF = 1, ϕ = 20, α = 100, Vc = 1000.Note the domination of the active protocell population in the stochastic steady state of the protocell population dynamics, starting from an initial state with only one protocell in the inactive state.

Figure 10 . 28 2.2 κ = 3400 Figure 11 .
Figure 10.Simulations of the model at κ = 2000.The same panels as in Figs. 3, 4 and 5 of the main paper are shown, but at κ = 2000.For the deterministic case, concentrations of the molecules as a function of time are also shown.

Figure 12 .
Figure 12.Fraction f of 'ACS active' protocells in the stochastic steady state of the protocell population dynamics versus κ, obtained from simulation (black hollow circles) and the mean field model (red solid dots).The error bars in f and f sim are ±∆f and ±∆f sim respectively, whose calculation is discussed in Section 3 of the Supplementary Material.Data taken from Table 2 of Supplementary Material.k F = 1, ϕ = 20, α = 100, V c = 1000.