Growth and Division in a Dynamic Protocell Model

In this paper a new model of growing and dividing protocells is described, whose main features are (i) a lipid container that grows according to the composition of the molecular milieu (ii) a set of “genetic memory molecules” (GMMs) that undergo catalytic reactions in the internal aqueous phase and (iii) a set of stochastic kinetic equations for the GMMs. The mass exchange between the external environment and the internal phase is described by simulating a semipermeable membrane and a flow driven by the differences in chemical potentials, thereby avoiding to resort to sometimes misleading simplifications, e.g., that of a flow reactor. Under simple assumptions, it is shown that synchronization takes place between the rate of replication of the GMMs and that of the container, provided that the set of reactions hosts a so-called RAF (Reflexive Autocatalytic, Food-generated) set whose influence on synchronization is hereafter discussed. It is also shown that a slight modification of the basic model that takes into account a rate-limiting term, makes possible the growth of novelties, allowing in such a way suitable evolution: so the model represents an effective basis for understanding the main abstract properties of populations of protocells.


Introduction
Protocells are cell-like structures that are simpler than present-day biological cells, but that are nonetheless able (i) to grow by some form of rudimentary metabolism, (ii) to reproduce giving rise to new protocells that are similar to their parents and (iii) to undergo evolution.
While viable protocells do not yet exist, these structures are supposed to have played a key role in the self-organizing processes leading to the emergence of life; moreover their study could be important in order to create new "protolife" forms which are able to adapt and evolve [1]. This endeavor has an obvious theoretical interest, and deep epistemological implications, but it might also lead to an entirely new living technology [2], definitely different from conventional biotechnology. Different protocell architectures have been proposed [3][4][5][6][7][8][9][10][11][12][13][14][15]. A prominent candidate structure is that of lipid vesicles, that have been observed to fission, under suitable physico-chemical conditions, giving rise to new vesicles [16]. In order to undergo Darwinian evolution, a necessary condition is that the rate of generation of new protocells depends on the chemical composition and physical properties of the parent vesicles. Again, different mechanisms have been proposed [3]; most of them identify a set of molecules that are able to collectively self-replicate, and that affect the growth and fission rates. Once a coupling has been established between the self-replicating molecules (i.e., the "protogenetic material") and the fission rate, evolution could in principle take place, so it would be appropriate to speak of a true protocell [17].
Theoretical models can be extremely useful to identify the most promising architectures, the most effective ingredients that can lead to their actual buildup [3,18] and to reject improbable hypotheses. Therefore, detailed models of the molecular and supramolecular structures and processes that may be involved are of the utmost importance. On the other hand, it is also interesting to consider the behavior of a protocell as an integrated "individual agent", using an approach similar to that pioneered by Ganti with his Chemoton model [6]. There is of course a tradeoff, in that such models usually need to resort to a less detailed account of the behavior of their components and therefore to a more abstract and more general description.
We believe that protocell research will benefit from both kinds of approaches, and we show here the outcomes of a new model of the "abstract" kind, that captures some key features of true physical processes. We will consider the case of a container, which can be tentatively identified with a vesicle formed by amphiphilic molecules in water; however, the model is abstract and it can describe also different physico-chemical scenarios. Other molecules, besides those that form the container, may be present in the cell. It is supposed that, when the container reaches a certain size, it becomes unstable and it halves into two approximately equal daughter cells [19].
Note also that being able to synthesize a single protocell capable of fission is not enough; one rather needs a population of protocells capable of sustained increase and fission through successive generations. In order to achieve this goal, the two key processes of membrane growth by means of the uptake of amphiphiles on the surface and that of duplication of the genetic material must synchronize. It is worthwhile to remark that the synchronization is a general requisite not depending on the selected scenario, RNA world, metabolism or whatever. If the genetic material grows faster than the protocell, it would overfill the space, eventually causing the explosion of the membrane. On the contrary, if the protocell grows faster than the genetic material, the latter would be diluted after several divisions (death by dilution). There will be in any case a core group of molecules (or even perhaps a single one) whose duplication must necessarily happen at the same pace as that of the lipid container. We call these molecules (proto-)genetic memory molecules (GMMs).
In previous works [4,5,20] we showed that the synchronization between the replication of the GMMs and the fission of the lipid container is a robust emergent property under a broad set of hypotheses, provided that there is a coupling between the concentration of the GMMs and the rate of generation of membrane molecules. This is indeed the key present-day bottleneck in creating a protocell: there are systems where the vesicle grows thanks to the continuous feeding of lipids from the outside [16], and there are systems where duplication of a set of molecules can be observed [21,22] (the easiest case being that of a single autocatalytic species) but it has so far been impossible to couple them in a single system. For modeling purposes we will assume here that such a coupling actually exists, so the growth rate of the container mass depends upon the concentration of some GMMs. We will also assume that the protocell is turgid, i.e., that it remains spherical, with constant membrane thickness, during growth; we also assume constant lipid concentration in the membrane. This means that during the division and duplication process some material losses should happen, in the case in which the concentration of internal chemicals do not change substantially (a plausible hypothesis in case of fast processes). Under these hypotheses a straightforward geometrical calculation, shown in Section 2.2, allows one to compute the increase in volume of the protocell, given its increase in membrane mass (that is affected by the GMMs).
Different protocell architectures have been proposed, that differ not only for the hypotheses about the chemical make-up of the protocell and of its key processes, but also for their location in the different compartments [1,[3][4][5][6][7][8][9][10][11][12][13][14][15]23]. There are models like e.g., the Los Alamos bug [11] where all the key processes (i.e., duplication of GMMs and increase of the container) take place in the lipid membrane or close to its surface. An even more extreme view is taken in so-called Gard models [24], where there is no strict chemical distinction between the two, the lipids also being the "genetic" molecules. We will consider here the most common architecture where the two key processes (formation of GMMs, formation of amphiphiles) are assumed to take place in the internal aqueous phase of the protocell, leaving to future works the discussion of the extension of our results to different architectures.
In this paper the dynamics of the GMMs will be described by resorting to a model, introduced in the 1980s by Kauffman, that has been often used to describe collectively autocatalytic sets of molecules [25]. The various molecular species are assumed to be polymers composed of monomers of various kinds (typically, two), and the reactions are of two types: condensation, where the ends of two polymers join to generate a single one, and cleavage, where a polymer is cut in two parts. It is also assumed that these reactions take place in the internal phase of the cell, only if catalyzed by another polymer, and that polymers are selected to be reaction catalysts at random. In the basic version of the model, the possibility of spontaneous and backwards reactions is ignored.
In the past, we have studied the behavior of this kind of systems using a stochastic approach [26,27], based on the well-known Gillespie algorithm [28]. This choice is motivated by the fact that, in studies on the prebiotic scenarios, it is possible that some molecules are formed at very low concentrations, where resorting to a continuous approach could be misleading. Moreover, we have also explored possible extensions of the basic model (i.e., introducing backward reactions, energy constraints, etc. [29,30]). Those initial studies were made in a flow reactor, as it is often done. Surprisingly enough, indeed, the study of self-replication has (almost) always been decoupled from that of protocell dynamics. Self-replication has been considered in a closed vessel or in an open-flow reactor, but none of them are a good model for a protocell. Closed systems show major limitations in describing "living" systems (e.g., depletion of reactants, accumulation of wastes), and the way in which a flow reactor is coupled with the environment is very different from that of a vesicle with a semipermeable membrane: the latter allows exchange of some but not all the molecules, while the former intakes all that is included in the input flow, and ejects all the solutes. In a recent paper [31] it was indeed shown that the dynamics of collective self-replication in a vesicle with a semipermeable membrane can be very different from that of a flow reactor [32], and that this property has interesting implications for the study of populations of vesicles. However, in that paper we limited our analysis to a container of fixed size, while in the present work we explore the consequences of coupling the dynamics of the GMMs to that of a vesicle that grows and divides.
We will show in the following that also in this case synchronization can indeed take place, thus allowing sustained growth of a population of protocells, provided that the set of chemical reactions hosts a RAF [33] (Reflexive Autocatalytic Food-generated) set [34], coupled with the growth of the container. A precise definition will be given in Section 3; for the time being, let it suffice to say that its presence guarantees that its molecules are continuously generated [35]. RAF sets can have quite complicated structure, where a fast subpart can lead to the extinction of slower ones; these phenomena will be briefly outlined in Section 4, but it is important to remark that synchronization takes place when a RAF set exists. Therefore, the model accounts for the sustained growth of a population of protocells. Heredity is also guaranteed by the duplication mechanism, whereby the newborn cells' compositions are similar to that of the mother one. But in order to assure evolvability, it is also necessary that (i) the population of protocells can be heterogeneous and that (ii) novelties can appear and sometimes spread out. As far as heterogeneity is concerned, it has already been observed in [31] that, if the concentration levels of some molecules are low, e.g. in the micromolar range, and if the protocells are small (say 100 nm), then the internal compositions will be different for simple statistical reasons [36]. Therefore the internal milieu can be different from the bulk, and it can give rise to different growth rates, and therefore to competition at the level of the protocells.
This does not hold for large protocells, whose initial internal compositions should be similar to that of the bulk. In this case, the diversity of the protocell population will be too small. Evolution might be based on the random appearance of new species, that might come from outside, or be generated by non-catalyzed reactions. In this case, in order to ensure evolvability, it is necessary that some new molecules have a chance to survive. As it will be seen, and as it can be intuitively grasped, life for a newcomer can be hard when RAFs are already in place. Indeed, it will be shown that the basic model does not allow these novelties to set in. However, the basic model incorporates some unrealistic assumptions (e.g., instantaneous diffusion), which lead to an exponential growth of RAFs. If these assumptions are relaxed, then it will be shown that novelties can have a finite chance to succeed.
The outline of the paper is the following. In Section 2 we precisely describe the main features of the basic protocell model. In Section 3 we discuss RAF sets and their properties, while in Section 4 we describe and analyze the results of some key simulations. In that section we show that synchronization is achieved if at least a RAF set exists, which is coupled with the container growth, and we analyze the dynamics of several RAFs coexisting in the same protocell. We also speculate on the effects of such competition in a population of protocells. In Section 5 we discuss the appearance of random molecules, and show that they tend to be eliminated in the basic model; as discussed above, we show that at least one modification of the basic model, that takes into account the finite transmembrane diffusion rate of permeating molecules, allows instead the success of some random newcomers. Finally, in Section 6 we will summarize the major outcomes of the study. We believe that not only a new model has been presented, tested and discussed, but that it sheds light on important mechanisms and on generic properties of replicating protocells, that might be valid even beyond the limitations of the present model. We also think that a convincing research agenda can be defined on the basis of these results.

Model Outline
We here briefly describe the main features of the stochastic model of catalytic reaction network [25], we discuss its behavior in a vesicle [31] and the coupling of the kinetic equations with the growth of the container. Several extensions of the model can be and have been considered (e.g., reverse reactions, spontaneous reactions, etc. [29,30]), but for the sake of clarity we will stick here to the basic model.

Entities and Their Interactions
The model describes a system in which simplified chemical species, i.e., monomers and polymers, interact with each other uniquely, via catalyzed reactions. Monomers and polymers are identified by ordered strings of symbols (or "bricks") from an arbitrary alphabet (e.g., A, B, C …). The length of a polymer is of course the number of its symbols. Each molecular type i = 1, …, N is characterized by the string that defines it and by its total number of copies (i.e., total amount) xi, or by its volume concentration [xi]. The entire set of number of copies of the different species of the system is denoted by Only two kinds of reactions are allowed in the system, namely: (i) cleavage, the cutting of a species made of more than one brick into two shorter species (e.g., AAAB → A+AAB) and (ii) condensation, the concatenation of two species in a longer polymer (e.g., ABA + BA → ABABA). Reactions can occur only if catalyzed by reaction-specific catalysts, so no spontaneous reactions are allowed (by assuming a sufficiently high activation energy for each reaction). Condensation requires an intermediate step with the creation of a temporary complex between one of the substrates and the catalyst. Finally, the presence of backward reactions is also neglected, by assuming a sufficiently high Gibbs energy ∆G for each reaction (catalysts lower both the forward and the backward activation energies, but if the latter is much larger than the former its rate remains very small). Thus, the reaction scheme can be summarized as follows: where A and B are two substrates involved in a specific reaction, C is the specific catalyst for that reaction and A:C represents the transient complex. The kinetic equations can be obtained by applying the law of mass action to this reaction scheme, as detailed in Appendix A.
The overall set of possible reactions in the system defines a so-called "chemistry". Defining a "chemistry", essentially amounts to identifying the set of reactions that are allowed and to associate a catalyst to a certain set of reactions (a molecular type can catalyze different reactions, and a reaction can be catalyzed by different molecular types). Following Kauffman [25], catalysts are, in fact, species belonging to X and, in particular, each species i has a certain (fixed) probability pi of being the catalyst for a reaction chosen at random (this is of course a strong hypothesis, for a detailed discussion see [17]). In the current version of the model we impose that only species composed of at least three symbols can be catalysts. The allowed reactions are all cleavage reactions, and only the condensation reactions that give rise to species up to a certain maximum allowed length. In the real world natural laws determine the chemistry, in artificial systems like ours each "chemistry" corresponds to a possible artificial world. In Figures 3, 4 and 9 we show some very simple examples of the artificial chemistries we used, but the effects we are showing in this article are general (provided the presence/absence of RAF structures within the involved chemistry) [37].
The dynamics of the system is stochastic and simulated by means of an extension of the Gillespie algorithm [28,38], described in more detail in Appendix A. The reasons why a stochastic approach is necessary in this case are related to the fact that when a new species appears it can be represented by a small number of exemplars, so the effects of randomness may be very relevant [31]. More details on the models we use can be found in Appendix A.

The Protocell Model
The dynamic behavior of catalytic reaction networks has been classically investigated in the case of a continuous stirred-tank reactor (CSTR) [26,29,[39][40][41][42][43][44][45]. However, a CSTR is very different from a protocell, and in [31] we introduced a more physically sound model, by describing a semi-permeable membrane separating the reaction network from the environment. Yet, the volume of the protocell was assumed to be constant and, accordingly, no association between the dynamics of the reaction network and that of the container growth was present. On the contrary, in this work we investigate the role of the both the variation of the volume and the division process, when the dynamics of the reaction network is associated to the growth of the container. Similarly to the model described in [31], here we assume a semi-permeable membrane that separates the reaction network from the environment (whose chemical composition is supposed to already be at its equilibrium). The membrane is modeled by allowing only some species to enter and leave the protocell, i.e., those species that are shorter than an arbitrary length Lperm . Conversely, species longer that Lperm remain entrapped inside or outside the protocell. Another key modeling choice is that the concentration of the permeable molecules is assumed to be homogeneous inside and outside the protocell, by hypothesizing infinitely fast diffusion, both in the bulk phases and across the membrane (This latter hypothesis will be relaxed in Section 5 below). In this way the chemical potentials of the permeable species are the same inside and outside. Since the external environment is supposed to be large and constant, the species that can cross the membrane are buffered, i.e., their concentrations are constant.
We assume that certain species belonging to X are coupled with the growth of the container. These species act as specific catalysts for the production of membrane lipids, assuming abundant and buffered lipid precursors. As for the case of the choice of cleavage and condensation catalysts, the species to be associated to the container growth are chosen randomly with a certain probability.
Let C be a measure of the mass of the container, i.e., the total number of lipid molecules in the membrane (it is proportional to the mass of the membrane under the assumptions made). Then the equation for the growth rate of the container takes the form: where Vr is the internal volume of the protocell (where reactions occur) and [xi] is the concentration of catalysts in the internal aqueous phase; the kinetic coefficients ki are zero for all those species that do not contribute to the container growth. The kinetics of lipid formation are first-order with respect to the concentration of catalyst, because an infinite supply of lipid precursors is considered available inside the protocell; any lipids produced inside the protocell are assumed to go instantaneously to the membrane and not occupy the internal volume.
Protocells can grow and divide: during these processes their form and shape can change [46]. However, although this is an interesting biological fact, we have already shown that it does not affect the synchronization between the internal reactions and the container growth [4], so-for modeling purposes-we can make the useful simplifications that protocells are spherical and turgid with constant membrane thickness.
This hypothesis allows one to determine the relationship between C and the volume, so that Equation (1) provides a rule to determine the volume growth rate. Under the assumptions made, one can show (see Appendix B) that the relation between the amount of lipids and the internal volume of the protocell Vr is equal to: As discussed in Section 1, we take here the simplifying approach to the process of cell division by supposing that there is a fixed threshold value of the volume of the protocell, so we assume that when C = θ, the protocell divides into two identical daughters [47]. In the division process the molecules present within the protocell at the division moment will be shared in identical proportion in the two daughters protocells. Under these assumptions, one can show (see Appendix B) that the ratio between the daughter and the mother protocells' volumes is: Since the volume of the daughter protocell is approximately , if the concentration of internal chemicals does not change (a plausible hypothesis), at each duplication around 30% of the internal material will be lost to the external environment. In Section 4.1 we will discuss the disappearance of some not internally produced chemical species during the succession of protocell generations: the material loss just presented increases this tendency but-as explained in detail in Section 4.1-this phenomenon is mainly due to the process of protocell growth and division. The synchronization among internal replicating materials and container indeed is a robust phenomenon, occurring also in duplication processes without material losses [4,5,20].

Collective Autocatalysis and RAF Sets
Different researchers have suggested several interesting models [17,24,25,[39][40][41][42][48][49][50][51][52] thatbesides their differences-all share the abstract idea that "collectively replicating chemical systems" can support both genetic and metabolic functions. In order for a protocell to duplicate, it is necessary that the molecules grow in number; very often (both in models and in present-day living beings) this growth is based upon the catalytic activities of some of the involved species, thus giving birth to the so-called "collectively autocatalytic reaction sets". Identification of such collectively autocatalytic sets is therefore a major issue in studying these systems.
A nice way of identifying the sets of chemical species able to catalyze their own growth is that of using a digraph representation [53] ("complete representation" in the following-see Figure 1) where circles stand for molecular species and squares depict reactions; species can participate to reactions as substrates (continuous links pointing to reactions in Figure 1) or products (continuous links starting from the reactions), or as catalysts (dotted links pointing to reactions in Figure 1). Another useful representation, besides this complete picture, is the simple directed graph ("catalyst-product graph" in the following), where there is an edge from a molecular species to all those species whose production reactions are catalyzed by that species. By means of this last scheme, the identification of subparts (subgraphs) potentially able to collectively replicate is a relatively simple task: these subparts are composed of the so-called Strongly Connected Components (or SCC), subgraphs where it is possible to reach every single node starting from any other node. This means that the presence of only one chemical species (one node) can enable the production of other species (other nodes), that in turn can enable the production of other species, until all species belonging to SCC are produced. This structure is therefore a possible component of a dynamic engine able to grow and reproduce [26,29,30]. However, in order to grow SCCs need the presence of the substrates required for their construction: so, real self-reproducing objects should be searched on the more complete digraph 1/ (2 2) where reactions, substrates, products and catalysts are all represented. In this digraph these structures are called RAFs (Reflexively Autocatalytic, Food generated sets), where the stress on the F (food) part highlights the requirement that the continuous presence of at least a subset of the chemical species is necessary and required [53,54] (see note [55]). In the model described here, the food molecules are supplied from outside: therefore they must be present in the external environment and be able to cross the membrane. A RAF is the union of all the reactions allowing the complete self-reproduction of the involved chemical species, so it may be composed of several separate subparts each one able to separately self-replicate: if these subparts cannot be further decomposed they are called irreducible RAFs (irrRAFs) [54]. IrrRAFs therefore are composed of SCC (in the catalyst-product graph) able to catalyze the production of their own substrates (if not already available) or linear chains of reactions whose root is catalyzed by some chemical species whose presence is guaranteed (the "Food" of the RAF sets). These structures can constitute the root of other linear chains of reactions, whose existence depends on these objects [17]; note that also these "tentacles" are included in the irrRAF set.
A very interesting and much studied case is that of random reaction systems, where the molecules are generated at random as well as their chemistry; in this case, major questions concern the possibility of observing collectively autocatalytic systems, i.e., RAF sets, or strongly connected components, i.e., SCCs. Indeed, both depend upon the probability p that a molecule chosen at random catalyzes a reaction chosen at random: if this probability is very low, neither are observed, while, if it is high, the system is full of both. Apart from these trivial regimes, there is an interesting region of intermediate values. It has been observed [31] that the probability of observing SCCs becomes non-vanishing for smaller values than those of observing RAF sets. Therefore there exists a region of p values where SCCs are found in the catalyst-product graph but, since there are no RAFs, they are not continuously supplied with the necessary substrates and are therefore unable to sustain a continuous growth of the number of their molecules. This phenomenon had been described before as a "fragility" of the cycles that are observed in the catalyst-product representation [26,45].

Synchronization
As anticipated in Section 1, in order to achieve a viable protocell evolution, the two key processes of duplication of the internal material and of membrane growth by means of the uptake of amphiphiles on the surface must take place at the same rate, i.e., they must synchronize [56]. It is worthwhile to remark that the synchronization is a general requisite not depending on the selected scenario, RNA world, metabolism first or whatever. If the internal material grows faster than the "container", it would overfill the space. On the contrary, if the container grows faster than the internal material, the latter would be diluted after several divisions. In previous works [4,5,20] we have shown that the synchronization between the aforementioned processes is a robust asymptotic emergent property, which is approached in successive generations.
If the kinetic equations for the GMMs are linear, and if some of them are linearly coupled to the growth of the container, then the model can be tackled with analytical methods. Synchronization can be analytically proven and, in the case where the kinetic coefficients are non-negative, the asymptotic cell division time can be also analytically computed as well as the total amount of molecules close to the division point [4,5,20]. Some nonlinear cases can also be shown to be amenable to an analytical approach, moreover several numerical investigation of nonlinear interactions show that synchronization is indeed a robust property, that holds under various hypotheses about the kinetic equations and also under various hypotheses about the protocell architecture. It is interesting to observe that synchronization can be achieved even in the case of chaotic equations for the GMMs, and that it turns out to be robust also when stochastic division is considered. We refer the interested reader for more details and for a thorough discussion to [4,5,20,57], while we limit ourselves here to observe that synchronization, although widespread, cannot be taken for granted, but that it has to be verified on the different kinds of dynamical equations. It is therefore interesting to ascertain under which conditions the model described in Section 2 can show synchronization.

Synchronization and the Role of RAF Sets
Let us observe that the model has no intrinsic distinction between catalysts and substrates, so the same molecular type can play either role in different reactions; if it is consumed as a substrate in a fast reaction, the system will be depleted of that type, and the reactions that need it as a catalyst will be slowed down and eventually stopped. Note also that molecular types that are neither produced nor consumed by catalyzed reactions will have, just before division, the same number of exemplars as they had in the beginning, so their concentration will be reduced until eventually a protocell with a single copy of that species will appear. At division time, only one of the daughter cells will inherit that copy, the other one having none: thus this event is the starting point of a protocell lineage without this particular species. Therefore, we arrive to the conclusion that only those molecular species that are produced by the system reactions have a chance to survive the division process. We will not consider below some cases, where synchronization occurs in quite "trivial" ways; for example, we will not consider the case where a molecular type in the food set directly contributes to the growth of the container. Other similarly trivial cases will be ignored, such as the case where A and AB are part of the food, and (i) another food molecule catalyzes their condensation to AAB; (ii) AAB is not the substrate of any reaction in the given chemistry; and (iii) AAB contributes to the container growth. In this case continuous production of a molecule coupled to the container is assured. These cases can be tackled with the technique used in our present works and they lead to synchronization. We will also suppose below that no food molecule is also a catalyst, therefore RAF sets need to include a SCC.
Species can be generated by several reactions but, as it has been observed in previous works, collective autocatalysis is fragile unless a RAF set is present. Therefore, it is reasonable to guess that the presence of a RAF set, coupled with the growth of the container, is a necessary condition for robust synchronization. This conjecture has been tested and confirmed in many different simulations, like the one shown in Figure 2, where one can see that species not belonging to any RAF ("non RAF species") undergo dilution while those belonging to the RAF survive and undergo synchronization in successive generations (Figure 2, as all following figures, shows typical behaviors that do not depend on the details of the particular artificial chemistry used). Suppose now that the protocell hosts a single irrRAF: if there were no cell divisions (and no resource limitations), its growth would be exponential [58]. The synchronized cell division leads to an approximately constant number of molecules at the end of each growth period, but the misleadingly "quiet" appearance of Figure 2 indeed hides the fact that at each division the number of protocells doubles and so do the quantities of chemicals. This exponential growth may be an approximately correct description of the phenomena on a limited time scale, but rapidly this fast increase leads to a condition where some nonlinear phenomena (e.g., resource limitation) become important: in this situation the dynamics of growth changes, leading to significant effects on the protocells themselves, as we shall see in Section 5.
Note that there exist different kinds of irrRAFs. An interesting case, shown in Figure 3, is that of a protocell hosting a single irrRAF that can be considered as "heterogeneous", as it is composed of two coupled SCCs, one of them being coupled also to the growth of the container. As expected, synchronization is observed also in this case.  Moreover, a single protocell may host different irrRAFs. In the case of independent (i.e., non-directly coupled) irrRAFs in the same protocell, it has been observed (and it can be easily understood) that if different irrRAFs have the same growth rate, then they all survive, even if they have different coupling coefficients with the container, or even if some of them are not coupled at all. In this case they are like parasites of the irrRAFs that contribute to the container growth.
However, having exactly the same kinetic parameters may be an unusual situation. Suppose for example that there are two irrRAFs, and that the fast one is coupled with the container growth while the other is not. In this case one observes the dilution of the slow one, while the other synchronizes with the cell duplication. There are however some strange phenomena that may be observed when an irrRAF is "dying", and they will be discussed below. The same behavior (i.e., dominance of the fastest) is observed when there are more than two irrRAFs growing at different rates, all coupled to the growth of the container.
A dual case is that of two irrRAFs, where only the slowest one is coupled to the growth of the container while the fast one is not. In this case the molecules of fastest irrRAFs increase their concentrations continuously in the protocell at division time, so this might lead to a breakup of the protocell. This is in any case an event outside the scope of the presented models. Also this behavior can be generalized to more than two irrRAFs.
Note also the intensity of the coupling among irrRAFs and the membrane has a major role: very high (but physically implausible) coupling values can change some of the previous observations. Indeed, the faster the container growth rate, the smaller the number of molecules at each division: for example, also in the case of irrRAFs having equal growth rate, once the species concentrations approach very low values at division time all the molecules belonging to one irrRAF could fall into just one of the daughters, leading to the birth of protocells without its presence.

The Fate of Declining irrRAFs
It is worthwhile to examine the fate of those irrRAFs that tend to be eliminated due to the presence of other, faster ones that are coupled with the growth of the container: in this case we can observe interesting phenomena due to the role of stochasticity when just one or a few exemplars of a given molecular species survives. Note that, while it is possible to rule out some phenomena related to a physically implausibly high contribution of a single or a few molecules to the growth of the container (as discussed at the end of the previous section) it is impossible to simply rule out as irrelevant the case in which just one copy of GMM survives. Indeed, even in present-day evolved cells, e.g. bacteria, there are genes that are present in single copies.
Let us consider again the case of two independent irrRAFs having different growth rates, no matter the intensity of the coupling with the membrane: the concentration of species belonging to the slowest one slowly decreases, reaching so low numbers of molecules that stochastic effects start to play a major role (Figure 4). It should be noticed that, as long as an irrRAF in a protocell has at least one molecule of one of its member species, it has the possibility of replicating the other species and of restarting its growth. The only way to remove an irrRAF is that of removing from the protocell all the molecules of all of its species: so the higher the number of species belonging to the irrRAF, the more difficult its removal. Therefore once an irrRAF has reached very low concentration levels its removal may require very long times, but eventually it is likely to occur (remember that the model is stochastic, so here we have a stochastic process with an absorbing boundary at zero concentration).
Since this process requires some generations to show up, in order to understand what may happen it is necessary to consider a population of protocells. Due to the stochastic character of the removal of the slow irrRAF, it is highly unlikely that it takes place at the same time in all the protocells. Therefore, after some time, there will be some protocells with two irrRAFs and some with only one, i.e., the fast one (ignoring the possibility that new irrRAFs are generated).
Let us suppose that at a certain time point t there are just two kinds of vesicles: those that have both irrRAFs and those that have only one, and let Yt and Xt respectively be their numbers. The fast irrRAF is present in both population, and it synchronizes with the container, so the evolution time can be described by a discrete map with constant Δt across the various generations. In the division process described in Section 2 it has been observed that, since the total volume of the two daughter cells is less than that of the parent one, some internal molecules are lost at each duplication; however we suppose that the numbers of copies of the molecules of the fast irrRAF are high enough, so that the probability that a protocell appears without any RAF, because both have been lost, is negligible (the reasoning below could be modified to take also a small loss term into account). For the sake of simplicity, we can suppose that each of the N species belonging to the slow irrRAF has only one molecule: so, the probability α of giving birth to a protocell with only one irrRAF is constant; moreover, for the time being, we will suppose that the probability of losing all the molecules of the slow irrRAF in the lost volume at division process is also negligible. Under these assumptions,  α = 1/2 N . If the damaged but not empty irrRAFs are able to recover one copy of all their molecules, then after duplication a population of M protocells, which have initially had both irrRAFs, will be composed of M + (1−α)·M protocells owning both irrRAFs (we neglect the chance that both irrRAFs are eliminated in the lost volume in duplication, that is much smaller than the chance that only one is), and by αM protocells owning a single irrRAFs. Note that the duplications of protocells having only one irrRAF surely gives birth to two protocells having only one irrRAF, and that this population may be increased by the one-irrRAF offspring of the other one.
So it is possible to define the simple iteration map: that describes the time change of the two protocell subpopulations: both actually increase, as is typical of linear systems, but the ''s rate is higher than that of the Y's, so the ratio of Y's to X's vanishes (see Figure 5). Under more realistic conditions, where nonlinear growth limiting terms do appear, this would imply extinctions of the Y's: within the protocell population the fraction of protocells having both RAFs eventually vanishes, leaving only those with the fast one. But this result may be a consequence of the linearity of Equation (4). One can introduce growth-limiting terms, due to e.g., overcrowding or resource limitations, like those of Equation (5), that are frequently used in population dynamics; however, also in this the fraction of protocells having both RAFs eventually vanishes, like in the previous case. A different behavior can be observed if the β parameter (that rules the growth-limiting term) takes different values for X and Y populations: if the presence of the losing irrRAF provides a positive contribution to the protocell dynamics, for example by giving increased resistance to overcrowding or improvement of the resource exploitation, then the fraction of protocells having this irrRAF does not vanish, and the two can coexist in the asymptotic population. This case constitutes an interesting example of interaction of spontaneous processes occurring at different levels of organization.

Novelties
There has been a lively discussion in the literature about the emergence of novelties in living systems (please refer to [17,59]). In the models presented in this paper, novelties could be identified with the unexpected appearance of few molecules of one or more chemicals species, due to spontaneous reactions (that may be slow but not completely absent) or to intake from the environment [60]. A major issue concerns the evolvability of these models: if every new molecule would always be eliminated, then the possibility to undergo radical changes would be limited by the chosen "chemistry". The issue is far from obvious: as we have seen, exponential growth of some molecular types can take place in the protocell, and newcomers might easily be eliminated in the "survival of the fittest" competition (as discussed in Section 4 with reference to competition among RAFs). The fastest process is the only one surviving, whereas the slower ones activities have to vanish. Therefore, in the original model with exponential growth a "truly new" molecule (i.e., one that has not and cannot be generated by the existing chemistry, given the initial conditions) could survive and replicate only if it were able to interact with an existing RAF, for example being recruited by it [61].
However, as has already been proposed in Section 4.2, exponential growth can be a first approximation to more realistic models with limitations to growth. Let us consider for example one of these possible rate-limiting steps, i.e., the finite diffusion rate of chemicals through the cell membrane. In the following part of this section, we will consider a model that differs from that of Section 2 in that the chemical equilibrium of the permeable species through the membrane is not instantaneous, but is ruled by Fick's law [62] (while the impermeable species cannot cross the membrane, as in the case of Section 2). Therefore: where dMi/dt is the rate of intake of the chemical i (number of molecules), Di is proportional to its diffusion coefficient divided by the (constant) membrane thickness; A is the surface of the protocell and [Mi out ] and [Mi int ] are the concentrations of the chemical i outside and inside the protocell, respectively. In this case, irrRAFs having growth rates equal to that of the already present ones but starting from very low concentrations have the possibility of increasing their presence and reach the already present irrRAFs (Figures 6 and 8a). Moreover, depending on the relative substrate intakes, irrRAFs having different growth rates can coexist (Figures 7 and 8b). While the simulations refer to the RAFs of Figure 9, their general properties are shared by several other cases that have been examined and therefore provide some indications on how advantageous (or at least not too disadvantaged) true novelties can indeed develop within the system.

Conclusions: Remarks and Indications for Further Works
We believe that the model presented here could provide a convincing picture, at an abstract level, of the main features of the relevant processes involved in protocell growth and division. The model can be improved in different ways, some of which will be summarized below, but it can also be used in its present version to address some important issues in protocell research.
First of all, let us recall that a population of units of evolution is "any population of entities with the properties of multiplication (one entity can give rise to many), variation (entities are not all alike, and some kinds are more likely to survive and multiply than others), and heredity (like begets like)" [63]. The protocells considered here do indeed have these properties: multiplication and heredity are fairly obvious and they are "built in" the model, while the necessary heterogeneity of protocells can be provided by some randomness at division time (not explicitly described here) and/or also by the smallness of the protocell itself (individual protocells, if small enough, can host different subsets of the complete set of chemicals). It has been observed [31] that a small vesicle can host on average very few exemplars of a given molecular species (e.g., just one in a 100 nm vesicle, if the concentration is 10 μM; if the concentration is 1 μM then there will be just an exemplar of that type in ten protocells).
It is interesting to observe that the results shown above prove that not only protocells can be different, but that they may also host new types of molecules (i.e., those that cannot be generated by the catalysts that are included in the "chemistry", and that may come from outside or from rare spontaneous reactions) and make them grow-if the growth limiting terms are taken into account as it has been proposed in Section 5. Therefore a population of this kind of protocells has the features required to be truly evolvable [17,63].
The previous observation on small protocells opens an interesting question: why are there protocells at all? They tend to be taken for granted, or to be given some superficial explanations. But, if the key reactions all take place in the internal water phase, what makes a protocell different from a similarly small portion of the bulk? The latter would contain, on average, the same substances at the same concentrations as the protocell. Confinement of the reaction products and substrates would not be an answer, since the concentrations are the same.
There are at least two different (and not conflicting) possible answers to the question concerning the necessity of compartments: (a) the smallness of the vesicle; or (b) a possible active role of the membrane. The first one has already been discussed above: small vesicles can have an internal composition that is very different, not only from that of other protocells (thus ensuring heterogeneity), but also from that of the bulk, thus allowing different reaction pathways to dominate their internal environments.
As far as alternative (b) is concerned, it is indeed highly plausible that the lipid membrane creates a local environment close to its boundaries that is markedly different from the bulk and that these surface effects modify the chemistry, by favoring some reactions and inhibiting others. In this case the membrane would itself act like a kind of catalyst, and would create a local environment different from the bulk. This would happen on both sides, but here the advantage of a closed vesicle is apparent: the molecules generated inside it will be trapped (if non permeable) in a small volume, so their concentrations will increase, while those that are generated on the outer side will freely diffuse in the bulk, without significantly affecting the external concentrations (at least until there are not very many protocells). It has also been shown that some superconcentration phenomena can appear in these cases [64]. The active role of the membrane has been advocated in models, like the Los Alamos Bug or GARD, where the key reactions take place in the membrane itself; the above remarks show that it can have a major active role also in the case where the key reactions take place in the water phase, but are affected by surface phenomena.
Finally, note that the presence of high protein concentrations in small vesicles are also experimentally reported, as for example in [65]; these differences between internal and external chemical concentrations could induce the presence of interesting osmotic processes [66].
As far as model improvements are concerned, there are of course many important directions. Indeed, we think that the picture that has been outlined above represents a rich class of models and opens up an interesting agenda for further research.
A major topic concerns the robustness of our analysis (concerning synchronization, role of different irrRAFs, novelties) in different protocell architectures, like the Los Alamos Bug or the GARD models. Previous works on synchronization with different equations for the replicators makes us confident that the results can actually be robust, but this has of course to be proven.
The appearance of novelties represents a major research topic: it would be important to better determine under which conditions the novelties are actually likely to survive and spread.
The model of the reactions could also be significantly improved, e.g. by considering a structure-dependent catalytic activity, instead of using a uniform probability of catalysis p. Other improvements (that have already been considered in previous works on flow reactors) involve considerations of backward reactions and the introduction of energy constraints.
Let us recall, from the above discussion, that there are at least two different (although not mutually exclusive) explanations of the importance of closed containers: heterogeneity due to smallness, and an active role of the membrane. Both should be explored with appropriate versions of the model in order to understand their features.
Other improvements are of course also possible. The model presented here can be the basis for a set of explorations of different alternatives, and it can be useful to cast basic questions about protocells in a precise form, amenable to analysis and simulation.
all the authors). The other authors contributed to the development of a series of concepts and models that were incorporated in the present work, concerning in particular the stochastic dynamics of catalytic reaction networks (Alex Graudenzi and Chiara Damiani) and the synchronization process (Timoteo Carletti). All authors have read and approved the final manuscript.
In this case the integration with the container (CSTR or permeable membrane) is straightforward. This schema is very effective in case of relatively high chemical concentrations; on the other hand it present difficulties when the concentrations are very low and the granularity of the system becomes significant.
So, the second schema directly simulates the disappearance and the formation of each single molecule by means of a Gillespie algorithm [28,38], and replicates the reaction schema of Section 2.1. The Gillespie reaction constants can be derived from the kinetic constant by means of the relations: In this case the integration with the container module (modeled by means of differential equation) is less obvious; nevertheless, the integration of these equations by means of the Euler schema can allow a very simple integration. Being a first order method, this schema implies variable changes directly linked to a finite time interval: in our framework this time interval is directly derived from the Gillespie framework, which therefore becomes the main systems' engine. So the general schema is: Int(X) and f(rules, LCi) indicate respectively the not fractional part of X and the dynamical rules (derived from of the differential equations description) that drive the chemical i mass exchange with the external environment. The time interval Δtg is very tiny (it correspond to the formation/disappearance of few molecules) and therefore is compatible with the Euler schema.
For simplicity reasons, if not differently indicated in this article the reactions' kinetic constants are set to Kclr = 10 (r), Kcfd = 5000 (d), Kcdd = 250 (d), Kfcd = 500 (d) (arbitrary units); note, however, that the structure of the relationships among RAFs, membranes and protocell growth does not depend on the particular values of these parameters (whereas, of course, these parameters determine the rates of the phenomena).
By the way, note that if the volume is time dependent (V = V(t)) and if the process is quasi-static (the volume variation is not too fast) the relations in A3 can allow the Gillespie simulation of systems where the container volume can vary in time (it is enough to take into consideration the volume changes and recalculate the Gillespie reaction stochastic constants during the simulation). See also [27] for further details.