Thermodynamics of Duplication Thresholds in Synthetic Protocell Systems

Understanding the thermodynamics of the duplication process is a fundamental step towards a comprehensive physical theory of biological systems. However, the immense complexity of real cells obscures the fundamental tensions between energy gradients and entropic contributions that underlie duplication. The study of synthetic, feasible systems reproducing part of the key ingredients of living entities but overcoming major sources of biological complexity is of great relevance to deepen the comprehension of the fundamental thermodynamic processes underlying life and its prevalence. In this paper an abstract—yet realistic—synthetic system made of small synthetic protocell aggregates is studied in detail. A fundamental relation between free energy and entropic gradients is derived for a general, non-equilibrium scenario, setting the thermodynamic conditions for the occurrence and prevalence of duplication phenomena. This relation sets explicitly how the energy gradients invested in creating and maintaining structural—and eventually, functional—elements of the system must always compensate the entropic gradients, whose contributions come from changes in the translational, configurational, and macrostate entropies, as well as from dissipation due to irreversible transitions. Work/energy relations are also derived, defining lower bounds on the energy required for the duplication event to take place. A specific example including real ternary emulsions is provided in order to grasp the orders of magnitude involved in the problem. It is found that the minimal work invested over the system to trigger a duplication event is around ~10−13J, which results, in the case of duplication of all the vesicles contained in a liter of emulsion, in an amount of energy around ~1kJ. Without aiming to describe a truly biological process of duplication, this theoretical contribution seeks to explicitly define and identify the key actors that participate in it.


I. INTRODUCTION
How living beings have been able to overcome the entropic forces to develop increasingly complex individuals which, in turn, maintain their functionality is an open question and one of the hardest problems of modern science [1][2][3][4][5][6][7][8][9][10][11][12].The exhaustive analysis of the energy flows in real living entities collides with the extreme complexity of even the simplest bacteria.Therefore, one must first set what are the physical defining properties of living beings and, then, try to attack the problem by cutting it into pieces.Each piece should incorporate a key or several key features, simple enough to accept a rigorous analysis, but complex enough to shed light to certain facets of the problem.The later integration of all pieces, however, will likely be much more than building a puzzle, for it is clear that the cross dependencies between all the building blocks will introduce an additional layer of complexity.
Following this philosophy, we will focus here on two crucial properties of living beings, according to accepted definitions of life discussed among scholars [13][14][15][16][17]. Specifically, we will concentrate in systems able to: i) Capture material resources and turn them into building blocks by the use of externally provided free energyand eventually undergo a duplication cycle and ii) Keep its components together and distinguish itself from the environment.It is assumed that the compartment contains the metabolic and information system -if any.Our * bernat.corominas-murtra@ist.ac.at simplified system, thus, will lack two crucial features of living beings, namely, iii) To process and transmit inheritable information to progeny and iv) To undergo Darwinian evolution through variation of the copied inheritable information and a successive selection of the better progeny.We will thus focus on the thermodynamic properties of the duplication process, and we will skip all the complexity arising from other phenomena.It is worth to recall here that this kind of approach, where the essential physics of the duplication problem is addressed has a long history, dating back to the late thirties of the 20th century, with the highly influential works of N. Rashevsky [1].
In contrast to the usual top-down approaches followed in biology, we will address this problem using a bottomup approach.In such kind of approaches to life-related phenomena, physical building blocks and chemical processes are externally assembled and triggered, creating artificial, synthetic entities that mimic some of the crucial properties of living beings.Consistently, this approach has been named Artificial Life [16,[18][19][20].Artificial cells, or, protocells are usually composed by emulsions [21] made of mixtures of lipids, precursors and water [15,16,19,20,[22][23][24][25][26][27][28].The foundation of this approach is based on three main starting points: First, it provides a framework where energy imbalances trigger the emergence of cell like aggregates [21], second, it is possible to externally drive simplified metabolic reactions [15,16,26,28], and, third, it uses the same type of building blocks -mainly lipids-that compose an important part of the structure of most of the living organisms [29].Crucial to our aims, it is worth to remark a couple of recent results: First, numerical approaches have shown that duplication dynamics as a consequence of energy imbalances due to geometrical frustration is expected in those systems, if properly driven out of equilibrium [30].Second, Cronin et al have been able to duplicate real artificial protocells through a specific oil-in-water droplet system with replicating dynamics [31].This result is certainly remarkable, but our approach does exclude the role of any information/replication dynamics.In doing so, we explore how far can we go by just taking into account general stability properties and energy imbalances to explain and characterize the duplication process.The work presented here runs in parallel to an interesting complementary approach taken in [32], where the kinetics involved in the duplication events of synthetic systems was studied in detail.
In this paper we will work with a generic emulsion system [21].We will make use of the well understood free energy landscape of such systems, where the contributions coming from aggregate geometry and size have been long studied [21,33,34], as well as the non-trivial contributions of the entropic terms [35,36].The impact of a changing energy landscape -which eventually can favour a duplication event-will be studied from a generic non-equilibrium situation making use of modern methods arising from the emerging field of Stochastic Thermodynamics [37][38][39][40][41][42].Within this framework, the evolution of the system can be studied following the individual trajectories in the phase space and, importantly, exact relations between energy and work can be obtained, even in out of equilibrium cases.In addition, relations between energy, entropy and information arise naturally [43,44].
The remaining of the paper is organized as follows: In the next section, we describe the thermodynamics of the abstract emulsion system in detail.We derive its free energy landscape, section II A, the equilibrium distributions, section II B, and the detailed balance condition over transitions, section II C. Next, in section II D, we expose the generic protocol that drives the system towards the occurrence of a duplication event.We end the section where the system is presented by exploring the orders of magnitude involved in these kind of systems, section II E, where we analyze the quantitative values of the thermodynamic functionals presented generically in the previous sections for a real microemulsion system.The thermodynamic analysis of the duplication thresholds is the core of section III.First, we derive a general relation for duplication probabilities, section III A. Then, in section III B, we explore the consequences of this result for a system evolving in a quasi-static fashion.Section III C generalizes the previous equilibrium approach by providing an exact equality between probabilities of duplication thresholds in a speicifc non-equilibrium scenario, in which the relaxation process that may eventually lead to duplication happens between two states which may not be in equailibrium.This equivalence leads us to define general duplication scenarios and derive the general conditions of duplication, as well as the amount of work invested over the system to trigger a duplication event, and the conditions for the perpetuation of the duplication cycle.Section III D refers to the free energy/entropy relations for the perpetuation of the duplication cycle in time.The final section is devoted to discuss the implications of the presented results.The whole paper is aimed to be self-contained and details of the derivations are provided in the appendix to make it understandable to non-specialized audience.

II. THE SYSTEM
Our system is conceived as being an abstract emulsion in a kind of reaction tank of volume V syst connected to a heat reservoir at inverse temperature β = 1  Twe set k B = 1.Let X = (X 1 , ..., X L ), where X i is a specific kind of lipid species populating the system and Y = (Y 1 , ..., Y P ), where Y i is a specific kind of precursor/surfactant species populating the system.Let X tot = (X 1,tot , ..., X L,tot ), Y tot = (Y 1,tot , ..., Y P,tot ) the total amount of molecules of the different species of lipids and precursors that lie in aqueous solution inside our volume.We refer to X tot , Y tot as the boundary conditions.As we shall see, they may change in time, under the action of an external protocol.
Due to the hydrophobic/hydrophilic nature of the surfactant molecules, we assume that (part of them) tend to aggregate in spheroidal compartments.Surfactants are supposed to populate the surface of the aggregates.No assumptions are made on the specific nature of the membranes or the interior of the aggregates, leaving the discussion always in a general plane.A state σ n of our system is described by a 3−tuple: where X a = (X 1,a , ..., X L,a ) and Y a = (Y 1,a , ..., Y P,a ) are the amount of lipids and precursors forming aggregates, respectively, and n the number of aggregates present in the volume.In general, and if no confusion can arise, we refer to a given state as σ n instead of σ n ( X a , Y a , n) for notational simplicity.We keep the label subscript " n " accounting for the number of aggregates only for notational convenience.When we introduce time dependence, we write σ t n ≡ σ n ( X a (t), Y a (t), n(t)).Not all molecules will be part of the aggregates.Therefore, we must account for these molecules in bulk.Consistently, given a state σ n ( X a , Y a , n) occuring under the boundary conditions X tot , Y tot , we wil have that X ) are the amount of lipids and precursors in bulk, respectively.
A macrostate or coarse-grained state σn is defined as the 4-tuple: where p(σ n |σ n ) is the probability distribution of finding σ n as a particular realization of this macrostate.This macrostate can be realized through any state containing X tot , Y tot and n protocellular aggregates following the distribution p(σ n |σ n ).In case of time dependence we write σt n ≡ σt n ( X tot (t), Y tot (t), n, p(σ t n |σ t n )).

A. Gibbs free energy landscape
The thermodynamic landscape of our system is given by the Gibbs free energy of the state σ n , The Gibbs free energy is always defined over states of the system and depends on both the state σ n and the boundary conditions X tot , Y tot .Therefore, the same state will have energy changes if the boundary conditions change.Each macrostate has a uniquely defined free energy functional.For notational simplicity, we drop the subscript Xtot, Ytot , if no confusion arises.
The complex nature of these type of emulsions results in a free energy functional with several blocks, which we construct step by step.First, we focus on the free energy contribution of a single protocellular aggregate, containing X lipids, Y and precursors, G a : where ∆µ Xi and ∆µ Yi are the changes in chemical potential when moving lipids and surfactants from bulk into the i-th aggregate, and G geo a geometric term expressing shape and surface contributions of the aggregate.This geometric term accounts for the membrane properties of the system, and is computed according to the existence of a minimum energy configuration or perfect protocellular aggregate, which can be directly computed as the optimal packing from the knowledge to the sizes and geometries of the precursor molecules.The geometrical term thus reads: where γ is the surface tension, α the compressibility coefficient, and κ the elastic bending modulus of the lipid membrane.The integral is the second order expansion of the contribution of the Helfrisch Hamiltonian to the overall free energy, being H the curvature of the membrane -as a function of some coordinates parametrizing the membrane surface-of the current aggregate and H 0 the curvature of the perfect aggregate.The integral is computed over the whole area of the membrane, A [33,34].
Once we have properly characterized the free energies of a single aggregate, we proceed to construct the free energy of the whole state σ n .The next task will be to compute the entropy for an system in the state σ n = σ n ( X a , Y a , n) under the boundary conditions X tot , Y tot .To compute the entropy of such state, we apply directly Boltzmann's definition over the amount of configurations the state σ n can adopt, Ω Xtot, Ytot (σ n ) [45]: Clearly, S(σ n ) ≡ S Xtot, Ytot (σ n ).However, we do not write this dependence explicitly for the sake of readability, if no confusion can arise.This entropic term has two contributions, the translational entropy and the configurational entropy.We start with the translational contribution.We consider that the system of n indistinguishable aggregates has 3n degrees of freedom and that each aggregate diffuse around within a volume V syst = nV a and that m is an appropriate length scale for such a diffusive process, one has that the amount of configurations provided by the translational term is: We emphasize that, in the approach take here, m has been chosen as a typical volume unit whose purpose is to render the argument of the logarithm dimensionless -for a deeper discussion on the choice of the right length scale see [35,36].For each configuration described above, we must account for the potential degeneracy of states, or, in other words, the amount of configurations given by the amount of molecules in bulk and forming the aggregates.
For each chemical species, e.g, the i-th lipid, this amount of configurations is Therefore, assuming again that there are no cross dependencies among the different configurations, one has that the amount of configurations of molecules in bulk and aggregates is: Considering these two contributions, the entropy term reads: And the overall entropy of the state σ n = σ n ( X a , Y a , n), under the boundary conditions given by X tot , Y tot , S(σ n ) = log Ω Xtot, Ytot (σ n ), is: where we used the fact the log(ab) = log a + log b and the Stirling approximation for the factorial for the first term, namely log n! ≈ n log n − n.Collecting all the above ingredients, we have that the Gibbs free energy of the system in the state σ n = σ n ( X a , Y a , n) under boundary conditions X tot , Y tot becomes: with the standard chemical potentials µ • Xi and µ • Yi of lipids and precursors, respectively.

B. Helmholtz free energy
Let the system be subject to the boundary conditions X tot , Y tot .In equilibrium, the probability that the system is in the particular state σ n , belonging to the macrostate σn is given by the Boltzmann distribution, p(σ n |σ n ) [45]: being Z(σ n ) the partition function, namely: Accordingly, the Helmholtz free energy of the macrostate σn , F (σ n ) is: being ... σn the average over all states of the microstate and H(σ n ) the entropy of the macrostate, namely: where p(σ n | σn ) is now defined as: We point out that we will refer to a given probability distribution associated to a macrostate σn either as p(σ n |σ n ) or p |σn , indistinctly.We finally recall that we assume that the equilibrium distribution macrostate σn is such that: where we emphasized the dependency on the boundary conditions X tot , Y tot only for clarity.In words, we assume that the equilibrium distribution is defined around the absolute minimum of Gibbs free energies, and that such a minimum is unique.

C. Detailed balance condition in duplication
The process of duplication/fusion of aggregates is of special interest for us, since it is the basis of duplication.It is assumed to satisfy the following transition rates between states: where the kinetic constants relate as: where ance condition is also assumed for any other transition between states.Therefore, for any two states σ n ∈ σn and σ n+1 ∈ σn+1 , thanks to the detailed balance condition given in equation ( 8) and assumed for all transitions, one has that, between two arbitrary states σ, σ : Importantly, we recall that the functional G must be computed under the same boundary conditions X tot , Y tot in any evaluation of the difference, i.e.:

D. The driving protocol
Let us assume that at time t = 0 the system is in contact to a thermal reservoir at inverse temperature β, and in an equilibrium macrostate σ0 n , that is -see figure (1a): .
From this moment on, we run a protocol that changes the energy landscape, without separating the system from the heath bath neither changing the whole system's volume, V syst .This protocol runs from t = 0 to t = τ -see figure (1b,c).For example, suppose that we add new lipids and that we switch on a light that triggers a metabolic reaction that transforms lipids into precursors, thereby creating new surfactants.We call this protocol ψ(t).In general, it will affect the L + P variables of our system.Therefore, the protocol ψ(t) consists on a list of -maybe interdependent-protocols: where the first L elements φ 1 (t), ..., φ L (t) explicit the action of the protocol on the lipids X 1 , ..., X L abundance and the last P elements φ L+1 (t), ..., φ L+P (t) explicit the change protocol on the precursors Y 1 , ..., X P abundance.Let us be more specific on the role of the protocol.Assume that at time t the boundary conditions of our system are given by X tot (t), Y tot (t).The application of the protocol a for short time interval [t, t+δ] to the boundary conditions, denoted by δψ( X tot (t), Y tot (t)) will lead the boundary conditions to change as: The above transformation of the boundary conditions will lead the system to change its macrostate, from σt to σt+δ .This transition can be done through a set of stochastic trajectories, which will be referred to as Σ[t, t + δ].At τ the system will be at the macrostate στ n+1 and we will stop the protocol -see figure (1d)-, letting the system relax towards an equilibrium state, achieved at time τ ∞ -see figure (1d).The distribution of states p |σ τ∞ n+1 is assumed to obey the standard equilibrium Boltzmann statistics: .
We assume that a duplication event has taken place in the time interval [τ − δ, τ ] and that the relaxation process happening at the interval [τ, τ ∞ ] does not imply a change in the number of protocell aggregates.We remind that the whole process takes place in contact to a heat reservoir with inverse temperature β and at a constant volume V syst .

E. Orders of magnitude
To grasp the orders of magnitude involved in our problem, we take a particular example of the above general system, in line to the one described in [26,27].From this example, we perform a rough estimation of the orders of magnitude involved in the computation of the free energies of a single aggregate.For the sake of readability and extension, the computations provided here are not as detailed as in the other parts of the paper.We refer the interested reader to [21,26,27,[46][47][48] for the detailed discussions on the orders of magnitude and potential experimental set ups. Suppose that we have a Windsor type IV ternary emulsion made of a single lipid, X ≡ decanoic acid anhydride, (C 9 ∆µ Y can be calculated from their partition coefficient -i.e. the fraction of lipids found in bulk solution as opposed to the aggregates.Estimations give this value to be around 14% [46].If k + Y /k − Y is the ratio between precursor molecules going from bulk to aggregates and precursor molecules going from aggregates to bulk, this reads: Therefore, using equation ( 8), and setting β = 1/k B T , one can approximate the energy gain of moving a decanoic acid molecule from bulk to aggregate, ∆µ Y , as where k B is the Boltzmann constant, k B ≈ 1.38 × 10 −23 J/K At T = 300K, and being N A the Avogadro number, the above equation leads to: which in turn evaluates to a partition coefficient of ∼2%.For the geometric term given by equation ( 2), we make the assumption that γ, α κ, therefore the contribution of the Helfrisch hamiltonian will not be taken into account.The surface tension and the compressibility parameters, γ, α can be estimated as γ ≈ 45.9mN/m and α ≈ 5.80 × 10 −45 Nm 3 [26].We assume a spherical lipid core of X c precursor molecules, whose individual molecular volume V X = 0.54nm 3 .Thus, the spherical core of the aggregate has a radius, R core (X c ), of: .
And the whole aggregate, including the surface molecules displays a radius, where t is the length of the tail of the surfactant molecules, which is considered constant.The optimal number of surfactant molecules, Y (X c ) for this amount of molecules in the core of the aggregate is then computed as: We assumet that the tail length of the surfactants is around t = 1.4nm and that their effective head area a 0 = 25 Å2 [21].The typical radius of oil droplets is around 100nm leading to a volume of ≈ 4.1 × 10 6 nm 3 , i.e., ∼ 0.004 femtoliter, which -assuming a typical waterto-oil ratio of 10:1-gives a system volume of 0.044 femtoliter per droplet.Therefore, a milliliter of emulsion has an order of magnitude of 10 13 oil droplets.From the ratio of precursor to droplet volume, it follows that X c ≈ 7.62 × 10 6 .With an optimal packing number of surfactants Y (X c ) computed from equation (10) and a partition coefficient of 14%, one can estimate a total of Y c = 5.7 × 10 5 surfactant molecules.With these values, a rough estimation of the orders of magnitude of the free energy of a single aggregate whose packing is optimal, G a (X, Y ), is given by: This example serves us as an orientation of the energy scales involved in our problem.

III. DUPLICATION THRESHOLDS
We proceed now to explore under which circumstances the application of the protocol results into a duplication event.The goal is to obtain an inequality which, when observed, a duplication event is expected to take place.This will be related to the amount of work performed from the protocol.We perform the analysis from a quasistatic approach and from a more general non-equilibrium approach.First of all, we derive a general condition for the transition probabilities among macrostates, which does not require equilibrium conditions.

A. Transition probabilities between macrostates
Now we take a close analysis on the process happening in the interval [t, t + δ], where 0 < t < τ .We drop the indices n because, we consider transitions between any two states, and, by now, there will not be necessarily a duplication event in consideration.At time t the Gibbs free energy landscape suffers a change imposed by the protocol ψ(t).The initial state, σt , is therefore perturbed and may no longer be necessarily in equilibrium.The system then relaxes to σt+δ , which may not be in equilibrium, too.The boundary conditions ( X tot , Y tot ) are considered constant after the change imposed by the protocol at time t until time t + δ.The probability of jumping from macrostate σt to macrostate σt+δ is given by: Thanks to the detailed balance condition, one can rewrite the backwards transition as: where g(t, t + δ) is a function that depends on the states, σ t , σ t+δ which, written in a suitable form for further developments, reads: .Now, we derive the probability that we chose a given trajectory σ τ −δ n to σ τ n+1 from the ensemble Σ[t, t + δ] of trajectories that go from σt to σt+δ -see figure (2).This probability distribution is referred to as p t+δ → , and is defined as: Conversely, we can define the backwards version of the above probability distribution, namely, p t+δ ← as: The probability p t+δ ← accounts for the probability of a given trajectory in case of time reversal of the protocol action.It is straightforward to check that both p t+δ → and p t+δ ← are well defined probability distributions -i.e., that they sum up to 1.With the above defined distributions, the above computations lead to: The above equation is the average over all paths of the last element of the sum, namely: where the brackets ... denote average over all the microscopic trajectories Σ[t, t + δ] between states σ t → σ t+δ that realize the transition from macrostate σt to macrostate σt+δ .

B. Quasi-static approach
The first exploration corresponds to the situation in which the transitions triggered by the protocol ψ(t) are performed quasistatically, that is: They are so slow that all the trajectories Σ[0, τ ] can be considered a succession of equilibrium states.Applying the general relation given by equation ( 14) we arrive, after cancellations, at: and we then recover, as expected, the equilibrium relation for the backwards and forwards probabilities: where δF (t, t + δ) is the increase of the Helmholtz free energies -see equation ( 7)-through the interval [t, t + δ]: As stated in the description of the protocol ψ(t), we assume that a duplication event has taken place in the time interval [τ − δ, τ ].To study case, we recover the subscripts n , n+1 accounting for the number of aggregates in the system.This will imply that, at time τ − δ we had the system in a macrostate στ−δ n and that at time τ the system transitioned towards a macrostate state στ n+1 .For that to happen spontaneously, we need that: and, according to equation ( 15) one needs that δF (t, t + δ) < 0, which implies: If we take a closer look to the structure of the Gibbs free energy, given in equation ( 4) we can refine the above condition.Indeed, since the boundary conditions X tot , Y tot do not change during the time interval [τ − δ, τ ), the contributions to the change of the free energies will only correspond to the free energies of the aggregates, due to size and frustration given in equation ( 1) and their associated entropic terms, given by the Shannon entropies of the macrostate and the translational and configurational entropies of the states, given in equation ( 3).After cancellations, we arrive to: where the increase on the average free energies δ G a is defined from equation (1) as: being the averages taken over the whole set of states belonging to στ n+1 and στ−δ n , respectively, and the entropic gradient δS(τ − δ, τ ) is defined as: where δS(τ − δ, τ ) the increase of configurational and translational entropies for each state, as given in equation (3): and δH(τ − δ, τ ) the increase on Shannon entropies, namely: Knowing the evolution of free energies, and assuming the quasi-static approach, one can easily compute the amount of work performed by the protocol ψ(t) to trigger a duplication event.Indeed, in the quasi static approach, the amount of work δw(τ −δ, τ ) invested over the system can be identified with the Helmholtz free energy gradients, namely: In consequence, the amount of work performed over the system along the protocol, W ψ is, assuming a continuous approach (δF (t, t + δ) → dF (t)): as expected in the case of equilibrium transformations [49].

C. Non equilibrium approach
We now explore a more general situation, in which the states visited along the trajectory are not necessarily in equilibrium, and thus, extra amount of dissipated heat is expected, deforming the energy/work relations derived in the previous section [37,38,50] -see figure (3).Our approach does not consider explicitly other sources of nonequilibrium behaviour, and is focused on the exploration of the potentials under the assumption that the final and initial states may not be equilibrium ones.In particular, the hypothesis of detailed balance between different states of the system is always assumed to hold at the level of microscopic transitions Specifically, let us consider the case in which at time t the boundary conditions X tot , Y tot , suffer a sudden change imposed by the protocol ψ(t).We observe that the change induced by the protocol to the boundary conditions implies a change on the free energy landscape described by the Gibbs free energy in equation ( 4).The initial state, σt , is therefore perturbed and is not necessarily considered in equilibrium.We consider that, in this irreversible destabilization of the system, an amount of entropy like: is produced, due to the non-equilibrium transformation that is associated to the perturbation of the system after the application of the protocol.This part will not be studied in detail, since it plays no role in the duplication process.The system then moves to σt+δ , which may not be in equilibrium, too.As above, the boundary conditions ( X tot , Y tot ) are considered constant in the interval [t, t + δ] after the change imposed by the protocol at time t.
If we don't assume a priori that the starting distribution p(σ t n |σ t ) is an equilibrium one, we see that, under the application of equation ( 14) we reach a more general relation -see appendix for details: FIG. 3: Irreversible action of the Protocol ψ.(a) at t = we have a macrostate σ0 n in equilibrium and the protocol induces a change in the boundary conditions that destabilizes the system eventually making it to jump a non equilibrium state, producing an amount of entropy βQ ψ (t).Then the system relaxes -maybe not completely-until the next action of the protocol until there is a stable division event and the system relaxes completely.(b) Detail of the transition, with the thermodynamic quantities involved.The jump experienced by the system from its previous state is D(p |σ t ||p |σ t * ), and ∆F is the energy gradient that leads to the new macrostate.The entropy produced through this -possibly partial-relaxation process is D(p→||p←) -see text.
where δG(t, t + δ) Σ[t,t+δ] is the increase of Gibbs free energy averaged over all trajectories Σ[t, t + δ] from σt to σt+δ .Unfortunately, the above inequality only can give necessary but not sufficient conditions for duplication.The derivation of an exact equivalence for a restricted range of situations -yet involving many non-equilibrium cases-is the objective of the next subsection.

Free energy structure
To achieve an exact relation between forwards and backwards probabilities of duplication, we need to develop some equivalences involving information-theoretic measures.These relations are derived from the exploration of the structure of δG(t, t + δ) .It is important to highlight that, since we do not assume we are in equilibrium, one cannot use the identity F (σ t ) = G σt −β −1 H(σ t ) anymore.Again, we focus our efforts in the study of the time interval [τ − δ, τ ], where we assume that a duplication event has taken place.As above, we recover the subscripts n , n+1 accounting for the number of aggregates in the system.We remind that this implies that, at time τ − δ we had the system in a macrostate state στ−δ n and that at time τ the system transitioned towards a macrostate state στ n+1 .
First relation.-Observethat we can decouple the general term δG(τ − δ, τ ) Σ[τ −δ,τ ] as follows: Let p |σ t be the probability distribution of the actual macrostate σt , and p |σ t * be the equilibrium distribution associated to the equilibrium macrostate σt , under the conditions imposed by the protocol at time t.That is, the probability distri-bution that would correspond to σt if it where in equilibrium, σt = σt * .In other words, we have an equilibrium distribution p(σ t |σ t * ) ∼ e −βG(σ t ) , sharing the support 1 set with the actual, possibly non-equilibrium, distribution p |σ t .After rearrangements one finds that -see appendix for details: n * , defined as [51]: .
The Kullback-Leibler divergence is a non-negative measure D(p In other words, as expected, if transitions are performed between equilibrium states, no contributions arise from this term.In analogy to the equilibrium Helmholtz free energy -see equation ( 7)-on can define a non-equilibrium Helmholtz free energy of the non-equilibrium macrostate σt , F(σ t ), as follows [43,52,53]: where the average is over all the states of the macrostate σt .If one assumes that the transitions between states obey the Markov property -see appendix for details-one can define the increase on non-equilibrium Helmholtz free energies, δF(τ − δ, τ ), as: where F(σ τ n+1 ) and F(σ τ −δ n ) are defined following equation (21).Now, thanks to equation (20), one arrives at: (22) being δD(τ − δ, τ ) the increase in the Kullback-Leibler divergence between τ − δ and τ , namely: and δF * (τ − δ, τ ) the increase on Helmholtz free energy of the equilibrium macrostates associated to στ n+1 and 1 Let p be a probability distribution defined over the set X and let X ⊆ X such that X = {x i ∈ X : p(x i ) > 0}.X is the support set of the probability distribution p.In words, the support set is the set of elements whose probability is larger than 0.
στ−δ n , respectively.The sign of δD(τ − δ, τ ) and its absolute value are important to understand the extent of the dissipative role of this term.Using the log-sum inequality [51] one is lead to -see details in the appendix: Again, using the log-sum inequality, one can prove that the above inequality becomes an equality only if the transitions are among equilibrium states -as expected-, i.e., Second relation.-Recognizingthat equation ( 9) implies that: one can rewrite δG(τ − δ, τ ) as: where the average is computed over all trajectories Σ[τ − δ, τ ] from macrostate στ−δ n to στ n+1 .Furthermore, markoviantity in the transition probabilities implies that: Now we develop the ... part of equation (25).From the definition of p τ → (σ τ −δ n , σ τ n+1 ) given in equation ( 12), and averaging directly, one arrives at: .
After some algebra, and using equation (21), one arrives to a relation involving the global forward and backwards probabilities -see appendix for details: where p τ ← is the backwards probability of trajectories, defined in equation (12).Specifically, D(p τ → ||p τ ← ) reads: If one assumes that there is no dissipation in the trajectory itself, and that the transition between states is performed in a quasi-stationary way, then D(p τ → ||p τ ← ) = 0. We highlight that this is true as long as the trajectories are balanced and no currents are present in the system.In general, one has that, due to the non-negativity of the Kullback-Leibler divergence:

Non-equilibrium duplication thresholds and work relations
Equation ( 26) encodes the relation between forward and backwards duplication probabilities.Indeed, exponentiating, one arrives directly at: .
The above relation gives us an exact relation between duplication and fusion probabilities in a general class of non-equilibrium cases.In consequence, equation ( 28) provides a necessary and sufficient condition for the triggering of a duplication event after the application of the protocol ψ(t).The above equation leads to the following duplication threshold: If we notice, as we did in the quasi-static case, that we can impose that δ G = δ G a , where δ G a is the average increase on the free energy in the aggregates due only to size and frustration, as given in equation ( 17): and δS(τ − δ, τ ) refers to the entropic contributions as described in equation (18).Equations ( 29) and (30) explicitly show how the tension between entropic gradients and free energy gains controls the duplication process.provides a nice, hands-on example of the imbalance between entropy and free energy gains that create structure and order that biology needs to overcome in order to endure.
From the order of magnitudes analysis of section II E, we can roughly estimate the numerical values involved in these inequalities in the case of ternary mixtures containing decanoic acid anhydride, ), a single precursor, Y ≡ decanoic acid, (C 9 H 19 − COOH), and water.As we outlined, the amount of aggregates in one milliliter of emulsion is ∼ 10 13 .Therefore, if we consider that just before the duplication the extra free energy was exactly the free energy of the aggregate at the optimal packing, we have that δ G a ∼ G a /n.Since, from equation (11), we know that G a ∼ 10 −13 , we have that: δ G a ∼ 10 −26 J/aggregate .
From that, considering T = 300K, we have that β = (k B T ) −1 ∼ 10 −21 J −1 , where k B is the Boltzmann constant.Therefore, from equation ( 30) one can estimate the minimum (statistical) entropy gradient as: We recall that this is entropy excludes the contribution of the heat generated in non-equilibrium transitions.
We now revise the work relations in this general nonequilibrium case.The non-equilibrium work performed over the system is: From the definition of work invested over the system, one can derive the minimum work invested to trigger a duplication event.Indeed, let us suppose now that we take as the final point the equilibrium macrostate στ∞ n+1 , with probability distribution p(σ τ∞ n+1 |σ τ∞ n+1 ).As we said in the description of the protocol, the action of ψ(t) stops at t = τ and then the systems relaxes towards an equilibrium in a quasi-static way.One can in consequence, calculate the minimal amount of required work invested over the system through the protocol to trigger a duplication event, to be named W ψ : With the above relation, one can compute the amount of dissipated work, W diss ψ , due to non-equilibrium loses: In consequence, from the definition of the nonequilibrium free energy given in equation (22), one can find an exact expression for the dissipated work: We observe that, consistently, W diss ψ ≥ 0, due to inequality (24).We now retake the exploration of the orders of magnitude involved in our problem by using again the example of the ternary mixture presented in section II E. The free energy of a single aggregate will determine, by construction, the minimum (non-dissipated) work needed to invest into the system to trigger the formation of an aggregate.Therefore, thanks to equation (11), we can bound numerically the order of magnitude of this work: From this, and knowing from section II E that the amount of oil vesicles is around 10 13 in a milliliter of microemulsion, we can conclude, under the assumption that a linear increase of volume to accommodate new protocells does not impact dramatically in the energy landscape, that the amount of work we need to invest to duplicate the amount of protocells contained initially in a liter of emulsion is lower bounded as: Finally, we can compute the amount of entropy produced throughout the whole process, S ψ by collecting the entropic terms, and adding the entropy produced by the destabilization of the system after each application of the protocol, βQ(t): Recall, again, that D(p τ → ||p τ ← ) ≥ 0. We remind that here we did not specify the formal shape of the last term, corresponding the heat produced within the nonequilibrium trajectories that destabilize the system right after the application of the protocol.We warn the reader that the potential relations between the dissipated work δD(τ − δ, τ ) and the entropy produced through the relaxation process, D(p τ → ||p τ ← ) are not studied here, but can contain relevant information for the conditions of the duplication process.Similar relations are studied in the context of the analysis of the structure of the second law [54], thermodynamics of computation [55], and work/energy relations in coarse grained approaches [56].

D. The perpetuation of the duplication process
A crucial condition for emergence of syntheticliving entities is the capacity for perpetuating the duplication process.In the framework derived above we very briefly revise the key ingredients for this successive duplications to be maintained.Let us suppose that at time t the system contains n(t) aggregates in equilibrium, i.e., σt n(t) .Following equation ( 29), there will be a duplication if, under the action of the protocol ψ(t), if (∀n(t) ∈ N)(∃τ > 0), the following condition holds If we plug equation ( 30) we obtain a criteria that explicitly relates the free energy increases of the aggregates, due to frustration and size, to the entropic gradients.In that context, the duplication process will go on as long as, (∀n(t) ∈ N)(∃τ > 0), the following condition holds: where δ G a is the average increase of free energies of the aggregates due to size and frustration from n(t) to n(t) + 1 in the time interval [t + τ − δ, t + τ ] and δS(t + τ − δ, t + τ ) the increase of the other entropic components, as defined in equation (18).Inequalities (33) and (34) are the inequalities that the protocol must trigger to ensure the continuation of the duplication process.They may be called inequalities for prevalence.
FIG. 4: Schematic picture of the conditions for the duplication process to be sustained in time.Duplication events are indicated by dashed circles.A system with n aggregatesgray line-in equilibrium receives the action of the protocol changing the energy landscape.Its Helmoltz free energy increases until a point in which the Helmholtz free energy of a macrostate containing n + 1 aggregates -orange line-is lower than the one for the n aggregates, and a duplication event occurs.If we switch on the protocol again, the system increases its Helmholtz free energy until a point in which, eventually, the Helmholtz free energy differences trigger again a duplication event -blue line.If the protocol is able to destabilize the system from n(t) to n(t) + 1 aggregates for any t, the duplication process will continue unboundedly in time.In this figure we described a quasi-static approach, which makes use of equilibrium Helmholtz free energies for the sake of clarity.The non-equilibrium case is thoroughly discussed in the text.
In summary, they tell us that the process can continue if the action of the protocol is able to trigger an imbalance between entropic contributions and free energy gradients favouring the equilibrium state containing n(t) + 1 aggregates as: ∆(aggregate free energy) < ∆(entropic gradients) In figure (4) we described a potential trajectory of a successive duplication process.We therefore derived a specific example of the race between entropy and free energy increases that enables the perpetuation of the duplication process.Other circumstances must be taken into account.For example, the volume of the system should increase in proportion to the aggregate number, in order to keep the concentration of chemical species inside the ranges in which the system remains in the phase where aggregates are formed.A significant change of this concentrations could result into a change on the phase of the system, where the preferred structures could no longer be spherical aggregates.Other circumstances, such as the specific application of the protocol, could also interfere the duplication process.

IV. DISCUSSION
In this paper we explored in depth the thermodynamics of duplication thresholds in a generic emulsion system made of an arbitrary set of lipid and precursor species.This feasible, yet artificial system enables us to overcome the tremendous complexity of the duplication process in actual living entities, such as cells.The thermodynamic landscape has been carefully constructed, accounting for the contributions due to surface tension, volume of the aggregates, entropic contributions and total amount of chemical species within the systems, all summarized in the definition of the Gibbs free energy of the state, equation ( 4).An abstract protocol is proposed, driving the system away from the equilibrium state, resulting, eventually, in a duplication event.We approached the problem from the equilibrium framework, assuming that the process is a succession of equilibrium states, and from a non-equilibrium perspective, where the visited states may not be equilibrium ones.
Fundamental relations involving free energies and duplication probabilities, equation ( 28), duplication thresholds, equations ( 29) and (30), necessary work to be invested over the system by the protocol to trigger a duplication event, equation (31), dissipated work, equation (32) or the conditions for the perpetuation of the duplication cycle, equations ( 33) and ( 34) have been derived.These relations invoke the explicit energy landscape provided by the free energies and set the abstract conditions for a duplication process to be triggered and, eventually maintained.It is worth to emphasize that they show explicitly the structure of the race between entropic forces and free energy gains to generate structure and preserve it.The synthetic approach, therefore, enabled us to convey a very detailed picture of the thermodynamical tensions involved in the process of creation and perdurability of living entities.
Further explorations should target more systematically specific systems, with quantitatively testable observables.The study of specific systems should also include the conditions of feasibility, in terms of microemulsion phases, of the aggregate duplication, avoiding transitions to nonaggregate phases, possible in emulsion systems.In the same line, a rigorous exploration of the orders of magnitude involved in the abstract relations derived above would add a necessary layer towards the quantification and, eventually, empirical test of the above predictions.Complementarily, the exploration of the constraints imposed by different protocol strategies could shed light to the potential prebiotic scenarios, where possibly circadian cycles play a crucial role in creating free energy sources driving the system towards imbalance, destabilization and duplication.In addition, more complex free energy landscapes allowing bilayer membranes, more realistic when compared to biological structures than the single layer approach used here, could refine the triggering points for duplication events to occur.In a different direction, an in depth study of the dissipation within the trajectories themselves -assumed to observe detailed balance in the above developments-would generalize the approach, making it more realistic and providing predictions on dissipated heat which could be presumable testable.Finally, the interesting relations involving dissipation and information measures could be explored to be the seed of further developments linking information and duplication processes, in line to the results exposed in [31], and, perhaps, clear the conditions for the emergence of inheritable information -thus the appearance of differentiated traits between elements of the system-intrinsically linked to the duplication process and, in the long term, trigger darwinian dynamics.

Appendix A: Derivation details
In this appendix we will use a simplified notation in order to emphasize only the technical details.We will consider the following scenario: The system is at t = 0 in a given macrostate A 0 and the change on the boundary conditions makes it to jump to macrostate A, whose states are denoted by "x", following a probability distribution p(x).The system relaxes until macrostate B, whose states are denoted by y, and follows a probability distribution q(y).Given a macrostate A with a distribution p(x), there is a macrostate with the same support but in equilibrium, A * , whose probability distribution will be denoted by p * (x), and follows a Boltzmann-like statistics: where Z x is the normalization constant and G(x).We define B * and q * (y) in a totally analogous way, defining the Gibbs free energy with the new boundary conditions induced by the application of the protocol.We will assume that the backwards and forwards transition probabilities obey equilibrium detailed balance: p(x → y) p(y → x) = e −βδG(x,y) , where δG(x, y) = G(y) − G(x) .e ln p(x) q(y)

A→B
, where A→B denotes that the average is performed through all trajectories from A → B. Accordingly, 1 = e βδG(x,y)−ln p(B→A) p(A→B) −ln p(x) q(y)

A→B
.
The Taylor expansion of the exponential ensures that e x ≥ 1 + x, so, if we know that e x = 1, then 1 + x ≤ 1, so x ≤ 0: Finally, the equality ln p(x) q(y) A→B = H(B) − H(A) follows directly from basic probability reasoning.Therefore, rearranging that and the above equation, one is led to equation (19).
Derivation of equation (24).-Giventwo distributions p q with the same support set, the log-sum inequality states that: x p(x) log p(x) q(x) ≥ x p(x) log x p(x) x q(x) , with equality only in the case in which p = q.In our case, if we assume that after the transition there can be a relaxation period -i.e., we approach q * -, we have that: whose second term can be written as: x,y f (x, y) log p(x)p(A → B)/(p(x)p(A → B))p(x → y) q(y)p(B → A)/(q(y)p(B → A))p(y → x) .
Rearranging and using the definition of f (x, y) and u(y, x) one arrives at:

FIG. 1 :
FIG.1: Schematic characterization of the role of the protocol ψ(t).(a) At time t = 0 the system is in contact to a thermal reservoir at inverse temperature β, and in an equilibrium state containing 1 aggregate.(b,c) The protocol starts by increasing the number of lipids and precursors and providing energy that may trigger chemical reactions.The action of the protocol is depicted by the red arrow.This process may change the energy landscape provided by equation (4) and thereby destabilize the structure of the aggregates, eventually creating more and more frustration in the surface.(d) At time τ the energy gradients favour the duplication and the protocol stops.(e) The system relaxes towards an equilibrium state containing 2 aggregates.

D
(q||q * ) ≤ x y p(y)p(x|y) log y p(y)p(x|y) y p * (y)p(x|y) ≤ x,y p(y)p(x|y) log p(y)p(x|y) p * (y)p(x|y) = x p(x) log p(x) p * (x) = D(p||p * ) .Leading to δD = D(q||q * ) − D(p||p * ) ≤ 0.Derivation of equation(20).-Givenan arbitrary distribution p(x) and an equilibrium one p * (x) = 1 Zx e −βG(x) with the same support, one can writeβG(x) A = − x p(x) log p * (x) − log Z x ,now the A denotes that the average is computed over the states of the macorstate A. Identifying − log Z x as the equilibrium Helmholtz free energy corresponding to p * (x), F x = − log Z x , and developing the cross entropy term − x p(x) log p * (x) as follows:− x p(x) log p * (x) = − x p(x) log p(x) + D(p||p * ) ,one obtains the desired result.Derivation of equation(26).-Letf (x, y) = p(x)p(x → y)/p(A → B) and u(y, x) = p(y)p(y → x)/p(B → A).Now we rewrite the increase on free energy as:β δG A→B =x,y f (x, y) log p(x → y) p(y → x) ,

H
(B) − H(A) − D(f ||u) + log p(B → A) p(A → B),where H(A), H(B) are the Shannon entropies of macrostates A and B, and D(f ||u) the Kullback-Leibler divergence between f and u.