The Role of Templating in the Emergence of RNA from the Prebiotic Chemical Mixture

Biological RNA is a uniform polymer in three senses: it uses nucleotides of a single chirality; it uses only ribose sugars and four nucleobases rather than a mixture of other sugars and bases; and it uses only 3′-5′ bonds rather than a mixture of different bond types. We suppose that prebiotic chemistry would generate a diverse mixture of potential monomers, and that random polymerization would generate non-uniform strands of mixed chirality, monomer composition, and bond type. We ask what factors lead to the emergence of RNA from this mixture. We show that template-directed replication can lead to the emergence of all the uniform properties of RNA by the same mechanism. We study a computational model in which nucleotides react via polymerization, hydrolysis, and template-directed ligation. Uniform strands act as templates for ligation of shorter oligomers of the same type, whereas mixed strands do not act as templates. The three uniform properties emerge naturally when the ligation rate is high. If there is an exact symmetry, as with the chase of chirality, the uniform property arises via a symmetry-breaking phase transition. If there is no exact symmetry, as with monomer selection and backbone regioselectivity, the uniform property emerges gradually as the rate of template-directed ligation is increased.


Introduction
An important feature of life is that it makes frequent use of a well-defined set of molecules, but it does not use many other kinds of molecules that have rather similar chemical properties. This has been called the 'lego principle' [1], and it has been proposed that this feature could be used as a signature of life on other planets. This principle applies to amino acids, where the set of possible amino acids is much larger than the 20 used in biological proteins [2], and to nucleotides, where there is a huge diversity of sugars [3] and nucleobases [4,5] that could potentially form polymers similar to RNA and DNA. Non-living chemistry is governed by thermodynamics and reaction kinetics. Similar molecules will have similar free energies of formation and will undergo similar reactions; hence, they will be produced in similar quantities. Living biochemistry is autocatalytic. A subset of molecules is able to catalyze formation of more of the same set of molecules that it requires for growth and reproduction. Because there is continued flow of energy and matter into a living system, the relative frequencies of different molecules can be maintained far from what would be found in thermodynamic equilibrium.
One example of the way autocatalysis maintains biased sets of molecules is the homochirality observed in nucleic acids and proteins [6]. Theoretical models assume that the formation of molecules of a given enantiomer (D or L) is catalyzed by molecules of the same handedness. In the simplest model of Frank [7], monomers are autocatalytic; however, more complex models have been studied Schöning et al. [17] studied four pairs of complementary oligomers, each made with TNA (α-threofuranosyl nucleic acid), RNA, and DNA backbones, and measured the melting temperatures of the 24 possible hybrid duplexes. They found a general tendency to favour uniform duplexes, although the results are complex, and we will return to them in the discussion section.
Monomer selection also involves a choice between nucleotides that differ in bases but have the same backbone. It is clear that strands made from the standard ACGU ribonucleotide alphabet are effective as templates for strands of the same alphabet. Non-enzymatic, template-directed synthesis of RNA and DNA has been studied [18], particularly with regard to the fidelity of sequence replication. A good monomer alphabet for genetics should allow little mismatch pairing. The possibility of GU pairing in the ACGU alphabet of RNA, and the absence of this possibility in the ACGT alphabet of DNA, is a potential reason for the transfer of information from RNA to DNA during evolutionary history [18]. Furthermore, a number of other base pairs have been studied that can be used as an extension of the coding repertoire of RNA and DNA [19,20]. These bases are 'orthogonal' to the standard bases, i.e., there should be little or no pairing between them and the standard bases. This means that sequences made of either the standard alphabet or the alternative alphabet would be templates for complementary sequences in the same alphabet, which is our proposition 1. However alternative pairs that are compatible with the RNA or DNA helix may not satisfy proposition 2, because the mixed sequences may still be good templates in this case. Studies of template-directed synthesis [21] have shown that the rate of nucleotide addition after a mismatch is much lower than after a match. This illustrates a close parallel between the monomer selection problem and the chirality problem, where the addition of a monomer of the wrong chirality can slow down the growth of an oligomer [12]. Although these effects cause slower average growth of oligomers, they actually increase the degree of uniformity of the longer oligomers. Mismatch inhibition increases the overall fidelity of RNA replication [21], and we expect that chiral inhibition would increase the enantiomeric excess of longer oligomers for the same reason.
Early studies on oligomerization of activated guanosine nucleotides using poly(C) templates [22,23] found that G-G bond formation is regioselective with a majority of 3 -5 bonds, i.e., the template favours formation of the same bond type in the complementary strand (proposition 1 above). It was additionally shown [24] that the A structure of the nucleic acid helix is important in bringing the nucleotides together in the correct conformation for 3 -5 bond formation. Furthermore, there have been several experimental studies that look at the effects of mixing 2 -5 and 3 -5 bonds. When regular 3 -5 strands are used as templates, the template-directed synthesis of the complementary strand is regioselective for 3 -5 bonds [25]. The duplex melting temperature is found to decrease systematically by around 15 • C as 2 -5 bonds are added to the regular 3 -5 structure [26,27]. This also suggests that strands of uniform bond type should be better templates than mixed strands (proposition 2). If 2 -5 bonds and 3 -5 bonds are similar with respect to rates of formation and hydrolysis, then the possibility arises that that strands of uniform bond type can be selected by symmetry breaking, as we discuss in this paper. However, it is already known that 2 -5 bonds may be much less stable to hydrolysis than 3 -5 bonds [28], hence, there is no true symmetry in this problem, and it is possible for one bond type to be more frequent than the other without there being a symmetry-breaking transition. A recent experimental study has shown that iterative degradation and repair of bonds gradually converts 2 -5 bonds to 3 -5 bonds [29] because bond formation in the context of the existing helix is regioselective. This effect is included in the model of backbone regioselectivity given here.
The aim of this paper is to present a simple computational model that is able to treat the problems of chirality, monomer selection, and backbone regioselectivity in the same way. We propose that all three types of order emerge when the rate of non-enzymatic template-directed synthesis is high, due to the fact that uniform strands are templates for their own kind, and that uniform strands are better templates than mixed strands. This mechanism is likely to occur at the level of oligomers, and does not require the synthesis of strands that are long enough to function as specific ribozymes.

Reaction Schemes
The reaction schemes used in this paper are summarized in Figure 1. For the chirality problem, we begin with equal concentrations of monomers of the two enantiomers D and L. These monomers can react to form oligomers of all possible sequences. Uncatalyzed polymerization is controlled by a reaction rate constant k pol . Let i and j represent any two oligomers or single monomers, and let ij represent the longer oligomer formed by joining these two (see example in Figure 1i). Let C i , C j , and C ij be the concentrations of these three oligomers. The rate of formation of ij by uncatalyzed polymerization is k pol C i C j . The reverse hydrolysis reaction occurs at rate k hyd C ij . Hydrolysis of ij can also occur at any other point in the sequence. For simplicity, we suppose that the same rate constant k pol applies for joining any two oligomers and that the same rate constant k hyd applies for hydrolysis at any point in any sequence. level of oligomers, and does not require the synthesis of strands that are long enough to function as specific ribozymes.

Reaction Schemes
The reaction schemes used in this paper are summarized in Figure 1. For the chirality problem, we begin with equal concentrations of monomers of the two enantiomers D and L. These monomers can react to form oligomers of all possible sequences. Uncatalyzed polymerization is controlled by a reaction rate constant kpol. Let i and j represent any two oligomers or single monomers, and let ij represent the longer oligomer formed by joining these two (see example in Figure 1i). Let Ci, Cj, and Cij be the concentrations of these three oligomers. The rate of formation of ij by uncatalyzed polymerization is kpolCiCj. The reverse hydrolysis reaction occurs at rate khydCij. Hydrolysis of ij can also occur at any other point in the sequence. For simplicity, we suppose that the same rate constant kpol applies for joining any two oligomers and that the same rate constant khyd applies for hydrolysis at any point in any sequence. In addition to polymerization and hydrolysis, we suppose that uniform strands can act as templates for template-directed ligation of two shorter oligomers of the same kind. Thus, in Figure  1ii, an LLLLL pentamer can be a template for the ligation of LL and LLL to make another LLLLL. The same also applies for uniform D sequences. In order for the reaction to be template-directed, we suppose that both short oligomers i and j must be fully uniform (only D or only L this case) and that the template must contain a uniform sequence at least as long as the concatenated ij sequence. Thus, the sequence DLLLLLLDDD is also a template for joining LL and LLL. For simplicity, we suppose that all strands that possess a uniform sequence of the minimum required length are equally good templates. Template-directed ligation is controlled by rate constant klig. The rate of ligation of i and j by templating is is the total concentration of all strands that are templates for formation of ij, using the rules above. The templating reaction is treated as a single-step reaction. We do not keep track of duplex states and we assume that ij separates immediately from the template once formed. It would be possible to add separate steps of duplex formation and separation to the model, but initially we are aiming for the simplest possible model. A final feature of the chirality model is that it must be possible to convert monomers from one enantiomer to the other in some way. The model is initialized with equal concentrations of L and D. In addition to polymerization and hydrolysis, we suppose that uniform strands can act as templates for template-directed ligation of two shorter oligomers of the same kind. Thus, in Figure 1ii, an LLLLL pentamer can be a template for the ligation of LL and LLL to make another LLLLL. The same also applies for uniform D sequences. In order for the reaction to be template-directed, we suppose that both short oligomers i and j must be fully uniform (only D or only L this case) and that the template must contain a uniform sequence at least as long as the concatenated ij sequence. Thus, the sequence DLLLLLLDDD is also a template for joining LL and LLL. For simplicity, we suppose that all strands that possess a uniform sequence of the minimum required length are equally good templates. Template-directed ligation is controlled by rate constant k lig . The rate of ligation of i and j by templating is k lig C i C j C temp ij , where C temp ij is the total concentration of all strands that are templates for formation of ij, using the rules above. The templating reaction is treated as a single-step reaction. We do not keep track of duplex states and we assume that ij separates immediately from the template once formed. It would be possible to add separate steps of duplex formation and separation to the model, but initially we are aiming for the simplest possible model.
A final feature of the chirality model is that it must be possible to convert monomers from one enantiomer to the other in some way. The model is initialized with equal concentrations of L and D. If no interconversion is possible, then no chiral symmetry breaking can occur. Direct racemization of sugars and nucleotides is extremely slow, but breakdown of a monomer to achiral precursors can also occur, as well as synthesis of monomers of both enantiomers from the precursors. We do not include precursors in this model. We assume that interconversion of L to D and D to L can occur by an effective single-step reaction at rates k int C L and k int C D , respectively. Only single monomers can be interconverted; monomers contained in oligomers cannot be changed until they are hydrolyzed back to single monomers. Including interconversion in this simple way means that the total concentration of nucleotides (including those in oligomers) is fixed, but the concentrations of D and L can each increase or decrease.
For the monomer selection problem, we consider two monomers R and X, where R represents a ribonucleotide, and X represents an alternative nucleotide. The simplest case for the monomer selection problem is identical to the chirality problem, except that the labels D and L are replaced by R and X (see Figure 1iii,iv). We are assuming that formation of mixed RX sequences is possible by uncatalyzed polymerization, but that only uniform R or X sequences can be joined by templating. In the chirality case, there is complete symmetry initially between D and L; hence, the reaction rate constants should be equal for oligomers of D and L. This is not necessarily true for R and X because they are chemically different. Therefore, in the results section, we will consider cases in which one or other type of monomer is a better template than the other. For simplicity, we allow the interconversion between R and X to occur in a single effective step, although breakdown and re-synthesis of could be included in a more complex model. The interconversion rate constants in the two directions need not be equal in this case.
The backbone regioselectivity problem requires a few small modifications to the model. There is only one kind of monomer, which we denote N for nucleotide, and two kinds of bonds. We have coloured the bonds red and black in Figure 1v, representing 2 -5 and 3 -5 bonds, respectively. Joining two oligomers can occur via either type of bond formation, leading to two longer oligomers with different bond sequences. In the simplest case, we suppose the rate constants k pol and k hyd are the same for the two bond types, but this can be relaxed. The templating reaction occurs only if the two short oligomers are of uniform bond type, and only formation of the bond of the same type is catalyzed by the template (Figure 1vi). To be a template, the templating sequence must contain a uniform sequence of bonds at least as long as the sequence of bonds that is formed by the ligation reaction. Thus, in Figure 1vi, the pentamer formed has four black bonds, so the template must have at least four black bonds. There is no equivalent of the interconversion reaction in the regioselectivity problem because there is only one type of monomer. When a dimer with one kind of bond is hydrolyzed, the two monomers can be rejoined with either bond type, so the number of bonds of each type is not fixed.

Computational Methods
In this paper, we study the three theoretical models described above using two kinds of computer simulations: reaction kinetics and Monte Carlo. In the reaction kinetics method, we keep track of the concentration of each monomer and oligomer species as a function of time, and solve the deterministic reaction kinetics equations by iterating forwards in small time steps of length δt. The change in concentration of oligomer ij due to formation from i and j and hydrolysis back to i and j is: There is an equal and opposite change in concentration of i and j from these reactions: We consider each possible pair i and j and sum up the total concentration changes of all molecules from all possible reactions. The concentration of the template in Equation (1) is different for each i and j. The template concentration does not change in the reaction for which it is a template, but templates are made by equivalent reactions involving other oligomers. Additionally, there is a change of the two monomer concentrations due to the interchange reaction described above: A slight modification is required for Equation (1) for the case of backbone regioselectivity. Whereas there is only one way to link two oligomers for the chirality and monomer selection problems, there are two ways to do this for the backbone regioselectivity problem. We therefore change the first term of Equation (1) so that each of the two reactions occurs at rate 1 2 k pol C i C j . The total rate of linking i and j is still k pol C i C j , which means that the distribution of lengths of oligomers is still the same as the other two models when there is no ligation term (See Appendix A). On the other hand, only one of the two kinds of bonds is catalyzed by the template-directed reaction. Therefore, the rate of this term remains a k lig C i C j C temp ij , as with the other two models. In the reaction kinetics method, we deal with all possible reactions deterministically; therefore, it is necessary to specify a maximum possible length, l max , of oligomers that can be formed, in order to keep the number of possible sequences finite. If i and j have a total length greater than l max , this polymeration reaction is not permitted to occur. For the results presented here, we set l max = 6. Thus the reaction system consists of 2 6 hexamers, plus all the oligomers shorter than 6. We specified the total monomer concentration as C tot = 10 monomers per unit volume (arbitrary units). The rate constants for polymerization and hydrolysis were fixed at k pol = 1 per unit time per unit concentration and k hyd = 1 per unit time. The behaviour of the model was studied for different values of k lig.
In the Monte Carlo method, we begin with a finite total number of nucleotides, N tot , and follow individual joining and hydrolysis reactions, keeping track of the sequences of all the oligomers that are formed. In this case, the computation is finite because of the finite number of molecules; therefore, it is not necessary to specify a maximum strand length. The oligomers are assumed to be reacting in a well-mixed solution of volume V. Reactions occur randomly with probabilities such that the expected rates are the same as the reaction kinetics method. We set N tot = 100,000 and V = 10,000; hence C tot = 10, as in the reaction kinetics method.
The Monte Carlo program also proceeds in small time steps of length δt. In each time step, each current strand is given a possibility of hydrolysis. For a strand of length n nucleotides, there are n − 1 bonds. Hydrolysis occurs with probability p hyd = (n − 1)k hyd δt (4) and one of the n − 1 bonds is chosen at random. Let N strands be the number of strands in the simulation (counting single monomers as a strand). N strands varies during the simulation and must be calculated at each time step, whereas the total number of nucleotides, N tot , is fixed. In each time step, each current strand is given a possibility to participate as the 'left' oligomer of a polymerization reaction (i). One of the N strands − 1 other strands is chosen to be the 'right' oligomer (j). The probability of the polymerization reaction occurring is If the reaction occurs, the new strand ij is formed by linking i and j.
The i + j reaction can also occur via a template-directed reaction. There are N strands − 2 other strands that could be the template. If all strands were templates, the probability of this reaction occurring would be: For each strand i, a second strand j is chosen at random, and a third strand k is chosen to be a potential template. If k is a template for the i + j reaction (as defined by the rules in Section 2.1), Life 2017, 7, 41 7 of 18 then the new strand ij is formed by the ligation reaction with probability p lig . If k is not a template, this reaction does not occur. At the beginning of the simulation, when all nucleotides are single monomers, N strands /V can be larger than 1. Also, we need to consider cases where k lig is much larger than 1 in order for the symmetry breaking to occur. Therefore, we begin with a very small δt (5 × 10 −5 ) to ensure that p lig is a probability that is less than 1. As the simulation proceeds, the number of strands decreases, and it is possible to make δt larger, to increase the efficiency of the program. We adjust δt so that p lig remains close to 5%, or so that p pol is close to 5% in cases where p pol > p lig .
The interchange reaction is straightforward in the Monte Carlo simulation: each single monomer has a probability k int δt of switching from D to L or vice versa.

Chirality
The parameters in the results shown here are C tot = 10, k pol = 1, k hyd = 1, k int = 1, and l max = 6, as described above. Figure 2 shows the faction of nucleotides φ D , φ L , and φ M at equilibrium. The fraction of nucleotides contained in uniform D oligomers is where n i is the number of nucleotides in sequence i, C i is the concentration of sequence i, and the sum is over all sequences that are uniform-D oligomers (dimers and longer). Similarly, we can define the fraction of nucleotides in uniform-L oligomers, φ L , and in mixed oligomers, φ M . In the latter case, the sum is over all sequences that contain at least one D and one L.
Life 2017, 7, 41 7 of 18 than 1 in order for the symmetry breaking to occur. Therefore, we begin with a very small δt (5 × 10 −5 ) to ensure that plig is a probability that is less than 1. As the simulation proceeds, the number of strands decreases, and it is possible to make δt larger, to increase the efficiency of the program. We adjust δt so that plig remains close to 5%, or so that ppol is close to 5% in cases where ppol > plig.
The interchange reaction is straightforward in the Monte Carlo simulation: each single monomer has a probability kintδt of switching from D to L or vice versa.

Chirality
The parameters in the results shown here are Ctot = 10, kpol = 1, khyd = 1, kint = 1, and lmax = 6, as described above. Figure 2 shows the faction of nucleotides D φ , L φ , and M φ at equilibrium. The fraction of nucleotides contained in uniform D oligomers is where ni is the number of nucleotides in sequence i, Ci is the concentration of sequence i, and the sum is over all sequences that are uniform-D oligomers (dimers and longer). Similarly, we can define the fraction of nucleotides in uniform-L oligomers, L φ , and in mixed oligomers, M φ . In the latter case, the sum is over all sequences that contain at least one D and one L.
When klig = 0, a majority of nucleotides end up in mixed oligomers. The fraction of nucleotides in uniform D and L oligomers are both low, and equal to one another ( Figure 2). There is a symmetry-breaking phase transition close to klig = 4.6 in this example. Above this point, the fraction of monomers in one kind of uniform oligomers is much higher than in the other kind (in this case D φ > L φ ), and the fraction in mixed oligomers also decreases rapidly. Symmetry breaking occurs because the symmetric mixture is unstable when klig is large. The simulations demonstrate this by beginning with a small bias towards D monomers-CD = 0.505Ctot and CL = 0.495Ctot. When klig is low, the bias disappears, and the equilibrium solution is perfectly symmetric. When klig is high, the bias increases, and the equilibrium has a large excess of D over L. There is an equivalent equal-and-opposite solution with an excess of L over D, which is reached if we When k lig = 0, a majority of nucleotides end up in mixed oligomers. The fraction of nucleotides in uniform D and L oligomers are both low, and equal to one another ( Figure 2). There is a symmetry-breaking phase transition close to k lig = 4.6 in this example. Above this point, the fraction of monomers in one kind of uniform oligomers is much higher than in the other kind (in this case φ D > φ L ), and the fraction in mixed oligomers also decreases rapidly. Symmetry breaking occurs because the symmetric mixture is unstable when k lig is large. The simulations demonstrate this by beginning with a small bias towards D monomers-C D = 0.505C tot and C L = 0.495C tot . When k lig is low, the bias disappears, and the equilibrium solution is perfectly symmetric. When k lig is high, the bias increases, and the equilibrium has a large excess of D over L.
There is an equivalent equal-and-opposite solution with an excess of L over D, which is reached if we began with a slight bias towards L. In any real case, there is a finite volume with a finite number of molecules, and small fluctuations are bound to create a slight asymmetry one way or the other. This asymmetry is magnified to a large excess of either L or D because the reaction system is inherently unstable. The prediction is that the system will become biased towards L or D with equal probability if it begins with equal concentrations of the two. It is not necessary to begin with any source of initial asymmetry. On the other hand, if the reaction system begins with a slight bias one way for other reasons (as is apparently observed for organic molecules in meteorites [30]), the autocatalytic reaction amplifies the asymmetry in the direction that is previously specified (theoretical examples of this are given in [10]).
As a further check that our simulations have reached an equilibrium, we also started simulations where C D = C tot and C L = 0. These simulations were run for a long time with no interchange reaction permitted, so that a distribution of uniform D strands was created. The interchange reaction was then turned on, and the simulation was continued to equilibrium. For low k lig , the same symmetric solution was reached as before, and for high k lig , the same chirally biased solution was reached as before. This shows that the biased solution can be reached either by amplifying a very small initial bias, or by allowing a system with a 100% chiral bias to relax to the equilibrium state.
We also used the Monte Carlo simulation to study the same case. The points in Figure 2 are from Monte Carlo, and the lines are from the reaction kinetics. These two methods give identical results, which is an important check on validity of both methods. In the Monte Carlo case, we calculate time averages of quantities once equilibrium has been reached. We can begin with exactly equal numbers of D and L monomers (N tot /2 = 50,000 of each type). It is not necessary to begin with a small bias in one direction, because the interchange reactions allow fluctuations of these numbers to arise. If k lig is above the phase transition, the fluctuations are amplified and the system shifts to a state with either high D or high L with equal probability. For the purposes of comparison with the reaction kinetics method in Figure 2, we plotted the higher concentration as D and the lower one as L in each case, but the labeling of D and L is arbitrary because there is a true symmetry in this problem and the symmetry breaking can go either way.
The reactions for hydrolysis, polymerization, and template-directed ligation are unimolecular, bimolecular, and trimolecular, respectively. Thus, the relative rates of these reactions depend on the concentration. The importance of templating is increased when the concentration is high. Figure 3 shows the nuclotide fractions as a function of total monomer concentration, C tot , with k lig = 10, k pol = 1, k hyd = 1, and k int = 1. At low concentration, D and L are equal, and chiral symmetry breaking occurs as C tot is increased. If this is repeated for different k lig rates, the concentration at which the symmetry breaking occurs is roughly inversely proportional to k lig . Figures 2 and 3 were done with maximum strand length l max = 6, in order to allow comparison of results between the two simulation methods. Using the Monte Carlo program, we repeated the simulation with l max = 10, and with no maximum length restriction (l max = ∞). The results are similar to the case with l max = 6, although the phase transition occurs at a higher value of k lig . This is shown in Figure 4 in terms of enantiomeric excess, ee. The ee is defined as the difference in concentrations of the two enantiomers relative to the sum: ee = (D − L)/(D + L). Figure 4 shows the ee for nucleotides incorporated into oligomers (length 2 or above). There are two equal and opposite solutions when k lig is high. In our model, there is direct interchange between single monomers of D and L. Therefore, the concentrations of the two single monomers remain equal. For this reason, the ee of single monomers is always zero, even above the phase transition, whereas the ee of the nucleotides in oligomers becomes non-zero above the phase transition. The main point of Figure 4 is that the bifurcation point shifts to Life 2017, 7, 41 9 of 18 slightly higher k lig as the l max is shifted from 6 to 10, to ∞, but there is no qualitative change. Imposing the maximum length restriction simplifies the simulations slightly, but states of biased chirality arise whether or not there is a maximum length. The equilibrium distribution of lengths of oligomers for these cases is discussed in the Appendix of this paper.  For the results shown above, we considered D and L nucleotides without specifying the base sequence. In order to deal more explicitly with complementary sequence pairing, we also considered a case in which there are eight kinds of nucleotides: A, U, G, and C bases, each of D and L form. Polymerization and hydrolysis reactions occurred at equal rates independently of the base and the chirality. For template-directed reactions to occur, both the oligomers i and j had to be of uniform chirality, and the same chirality as each other, but they could have any base sequence. A third  For the results shown above, we considered D and L nucleotides without specifying the base sequence. In order to deal more explicitly with complementary sequence pairing, we also considered a case in which there are eight kinds of nucleotides: A, U, G, and C bases, each of D and L form. Polymerization and hydrolysis reactions occurred at equal rates independently of the base and the chirality. For template-directed reactions to occur, both the oligomers i and j had to be of uniform chirality, and the same chirality as each other, but they could have any base sequence. A third . Enantiomeric excess (ee) of nucleotides contained in oligomers. The cases of l max = 6, 10, and ∞ are very similar, but the chiral symmetry breaking occurs at slightly higher k lig when l max is larger. The ee is zero below the phase transition, and can either be positive or negative once the symmetry is broken.
For the results shown above, we considered D and L nucleotides without specifying the base sequence. In order to deal more explicitly with complementary sequence pairing, we also considered a case in which there are eight kinds of nucleotides: A, U, G, and C bases, each of D and L form. Polymerization and hydrolysis reactions occurred at equal rates independently of the base and the chirality. For template-directed reactions to occur, both the oligomers i and j had to be of uniform chirality, and the same chirality as each other, but they could have any base sequence. A third sequence could be a template if it contained a uniform sequence of the same chirality as i and j, with a base sequence that was complementary to the ij sequence that is formed by ligation. Only AU and GC pairs were permitted when checking for complementarity. The results of this case, shown in Figure 5, are very similar to Figure 2, except that the phase transition occurs at k lig close to 40 rather than 4.6 in Figure 2. The main effect of introducing the detailed requirement of complementary pairing is to reduce the concentration of strands that can be catalysts for any two oligomers being ligated. As a result, higher values of k lig are required in order for the symmetry-breaking effects of the template-directed reaction to become apparent. However, the essential symmetry-breaking effect is the same as in the simpler model. a base sequence that was complementary to the ij sequence that is formed by ligation. Only AU and GC pairs were permitted when checking for complementarity. The results of this case, shown in Figure 5, are very similar to Figure 2, except that the phase transition occurs at klig close to 40 rather than 4.6 in Figure 2. The main effect of introducing the detailed requirement of complementary pairing is to reduce the concentration of strands that can be catalysts for any two oligomers being ligated. As a result, higher values of klig are required in order for the symmetry-breaking effects of the template-directed reaction to become apparent. However, the essential symmetry-breaking effect is the same as in the simpler model.

Monomer Selection
In this problem, R represents a ribonucleotide and X represents an alternative nucleotide. If the rate constants for the two nucleotides are the same, the problem is identical to the chirality problem. The symmetry between R and X is broken as klig is increased. In Figure 6, we also considered the case where the ligation rates for the two types of polymer are slightly different: This means that R has an advantage over X because the ligation rate of uniform X sequences is less than that for uniform R sequences. It can be seen that the phase transition is 'rounded out' when there is no exact symmetry between R and X. When klig is far above or far below the phase transition, the concentrations of R and X are close to that for the symmetric case. Figure 6 also shows a second solution at high kligR where the symmetry is broken in the opposite direction, i.e., X dominates even though it has a lower ligation rate than R. This solution only exists above a minimum ligation rate (close to 6 in the figure), whereas the solution where R dominates exists for all values of kligR. The two solutions are obtained by beginning with monomers of all R or all X and allowing a steady state to be reached before the interchange reaction is turned on. The interchange reaction is then turned on and the simulation is then continued to equilibrium. For ligation rates where there is only one solution, the same solution is reached starting from both extremes. When there are two solutions, these are reached from the two different starting points. It should be remembered that, even in the simplest case in Figure 2, there are two solutions when klig is

Monomer Selection
In this problem, R represents a ribonucleotide and X represents an alternative nucleotide. If the rate constants for the two nucleotides are the same, the problem is identical to the chirality problem. The symmetry between R and X is broken as k lig is increased. In Figure 6, we also considered the case where the ligation rates for the two types of polymer are slightly different: k ligX = 0.9k ligR . This means that R has an advantage over X because the ligation rate of uniform X sequences is less than that for uniform R sequences. It can be seen that the phase transition is 'rounded out' when there is no exact symmetry between R and X. When k lig is far above or far below the phase transition, the concentrations of R and X are close to that for the symmetric case. Figure 6 also shows a second solution at high k ligR where the symmetry is broken in the opposite direction, i.e., X dominates even though it has a lower ligation rate than R. This solution only exists above a minimum ligation rate (close to 6 in the figure), whereas the solution where R dominates exists for all values of k ligR . The two solutions are obtained by beginning with monomers of all R or all X and allowing a steady state to be reached before the interchange reaction is turned on. The interchange reaction is then turned on and the simulation is then continued to equilibrium. For ligation rates where there is only one solution, the same solution is reached starting from both extremes. When there are two solutions, these are reached from the two different starting points. It should be remembered that, even in the simplest case in Figure 2, there are two solutions when k lig is above the transition point. As these are equal and opposite, only one set of curves appears in Figure 2, but the two opposite solutions are visible in the ee graph (Figure 4). In Figure 6, the two solutions are not equal and opposite; therefore, two sets of curves are visible. A second way in which R and X can differ is by their frequency in the monomer mixture prior to polymerization. If the rate constants for interchange from X to R and from R to X are kintXR and kintRX, respectively, then the frequencies of monomers at equilibrium under the interchange reaction, in absence of any polymerization will be unequal: Figure 7 shows the case where there is a two-fold advantage to X in terms of equilibrium monomer frequencies ( ), but a two-fold advantage to R in terms of ligation rates In this case, the solution where X dominates exists over the whole range of ligation rates, and a second solution where R dominates exists only for high ligation rates. Thus, the frequency of the monomers in the reaction mixture is a major factor that determines which type of monomer dominates when templating occurs.
The number of monomer types in the mixture need not be limited to only two. In Figure 8, we consider the case where there are two alternatives, X1 and X2, in addition to R. We assume that all the interchange rates are equal, so all three types have equal average concentration in the monomer mixture (  A second way in which R and X can differ is by their frequency in the monomer mixture prior to polymerization. If the rate constants for interchange from X to R and from R to X are k intXR and k intRX , respectively, then the frequencies of monomers at equilibrium under the interchange reaction, in absence of any polymerization will be unequal: Figure 7 shows the case where there is a two-fold advantage to X in terms of equilibrium monomer frequencies (C R = 1 2 C X = 1 3 C tot ), but a two-fold advantage to R in terms of ligation rates k ligX = 0.5k ligR . In this case, the solution where X dominates exists over the whole range of ligation rates, and a second solution where R dominates exists only for high ligation rates. Thus, the frequency of the monomers in the reaction mixture is a major factor that determines which type of monomer dominates when templating occurs.
The number of monomer types in the mixture need not be limited to only two. In Figure 8, we consider the case where there are two alternatives, X 1 and X 2 , in addition to R. We assume that all the interchange rates are equal, so all three types have equal average concentration in the monomer mixture (C R = C X 1 = C X 2 = 1 3 C tot ), and we give R a small advantage in the ligation rate (k ligX 1 = k ligX 2 = 0.9k ligR ). In this case, it is the solution where R dominates that is possible over the

Regioselectivity
If we assume a complete symmetry between 3′-5′ bonds and 2′-5′ bonds, then the regioselectivity model behaves similarly to the chirality model. There is a symmetry-breaking phase transition qualitatively similar to Figure 2, in which one or other of the two bond types dominates when klig is high. We do not show this case, because there are many ways in which these two bond types might differ in practice, and there is no reason to suppose perfect symmetry. In Figure 9, we

Regioselectivity
If we assume a complete symmetry between 3′-5′ bonds and 2′-5′ bonds, then the regioselectivity model behaves similarly to the chirality model. There is a symmetry-breaking phase transition qualitatively similar to Figure 2, in which one or other of the two bond types dominates when klig is high. We do not show this case, because there are many ways in which these two bond types might differ in practice, and there is no reason to suppose perfect symmetry. In Figure 9, we

Regioselectivity
If we assume a complete symmetry between 3 -5 bonds and 2 -5 bonds, then the regioselectivity model behaves similarly to the chirality model. There is a symmetry-breaking phase transition qualitatively similar to high. We do not show this case, because there are many ways in which these two bond types might differ in practice, and there is no reason to suppose perfect symmetry. In Figure 9, we suppose the major difference in the two bond types is that the uniform 3 -5 oligomers have a higher ligation rate than the uniform 2 -5 oligomers (k lig2 = 0.9k lig3 ). It is seen that there is a solution where 3 -5 strands dominate that is possible over the full range of ligation rates, and a solution where 2 -5 strands dominate that is possible only at high ligation rates. The other rate parameters, k pol and k hyd , are assumed to be equal in this example, but are likely to be different in an experimental system. Which of the bond types dominates will be a function of the rates of all these parameters for both bond types.
Life 2017, 7, 41 13 of 18 suppose the major difference in the two bond types is that the uniform 3′-5′ oligomers have a higher ligation rate than the uniform 2′-5′ oligomers ( ). It is seen that there is a solution where 3′-5′ strands dominate that is possible over the full range of ligation rates, and a solution where 2′-5′ strands dominate that is possible only at high ligation rates. The other rate parameters, kpol and khyd, are assumed to be equal in this example, but are likely to be different in an experimental system. Which of the bond types dominates will be a function of the rates of all these parameters for both bond types.

Discussion
In the introduction, we reviewed the experimental evidence that oligomers with uniform chirality, monomer composition, or bond type are effective templates for the growth of complementary strands with the same chirality, monomer composition, or bond type, and that oligomers that are mixed in any of these properties are less effective templates than uniform oligomers. The computational models studied in this paper incorporate these features, and show that, when template-directed ligation is fast compared to random polymerization without a template, we expect that uniform biopolymers will emerge, and that they are homochiral, use a restricted monomer set, and have a highly regioselective backbone. We thus argue that these essential properties of biological nucleic acids have emerged as a result of the importance of template-directed reactions in the early stages of evolution. Although models for the emergence of homochirality have been widely studied previously, the link between chirality, monomer selection, and regioselectivity has not been emphasized in the past. The present theory therefore contributes by showing in a simple way why these three problems are very similar. These results lead us to expect that RNA with the three uniform properties seen in biology could emerge from a prebiotic chemical mixture before the origin of life, in the sense that the proposed mechanism requires only the existence of short oligomers that can undergo template-directed replication, but it does not require the existence of specific sequences that would act as catalysts (ribozymes).
The mechanism we discuss here depends on the ability of nucleic acids to be templates; hence, it applies to RNA and polymers with similar structures, but not to other kinds of biomolecules that are

Discussion
In the introduction, we reviewed the experimental evidence that oligomers with uniform chirality, monomer composition, or bond type are effective templates for the growth of complementary strands with the same chirality, monomer composition, or bond type, and that oligomers that are mixed in any of these properties are less effective templates than uniform oligomers. The computational models studied in this paper incorporate these features, and show that, when template-directed ligation is fast compared to random polymerization without a template, we expect that uniform biopolymers will emerge, and that they are homochiral, use a restricted monomer set, and have a highly regioselective backbone. We thus argue that these essential properties of biological nucleic acids have emerged as a result of the importance of template-directed reactions in the early stages of evolution. Although models for the emergence of homochirality have been widely studied previously, the link between chirality, monomer selection, and regioselectivity has not been emphasized in the past. The present theory therefore contributes by showing in a simple way why these three problems are very similar. These results lead us to expect that RNA with the three uniform properties seen in biology could emerge from a prebiotic chemical mixture before the origin of life, in the sense that the proposed mechanism requires only the existence of short oligomers that can undergo template-directed replication, but it does not require the existence of specific sequences that would act as catalysts (ribozymes).
The mechanism we discuss here depends on the ability of nucleic acids to be templates; hence, it applies to RNA and polymers with similar structures, but not to other kinds of biomolecules that are also chiral in today's organisms but cannot be templates. The ability of nucleic acids to be templates is, of course, a prime argument in favour of some kind of RNA World scenario for the origin of life. In an origins scenario where RNA replication arises early, and where ribozymes are the initial biological catalysts, it is possible to explain the transfer of chirality from RNA to other biomolecules by the fact that chiral RNAs catalyze the synthesis of the other molecules. In particular, if protein synthesis depended on chiral tRNAs and rRNAs, then amino acids of the appropriate chirality could be charged onto tRNAs, and protein sequences of uniform chirality could be synthesized. This would be true, even if the mixture of single amino acids were completely racemic, because amino acids of the wrong enantiomer would not be recognized by the ribozymes. On the other hand, if the amino acids were already chiral for some other reason at the time of the invention of protein synthesis, the early ribosome would simply make use of the existing supply of chiral amino acids.
Related to this is the possibility that the transfer of chirality was in the other direction: from amino acids to sugars and nucleotides. Several studies have shown [31,32] that chiral amino acids can catalyze the synthesis of chiral glyceraldehye and other sugars. There is also evidence for a moderate enantiomeric excess in amino acids found in meteorites [30], which is presumed to originate by an abiotic mechanism outside the Earth, such as chirally biased degradation under the influence of circularly polarized light. We have previously considered the way that a small extraterrestial chiral bias can contribute to scenarios involving the emergence of homochirality [10]. Understanding the direction of chiral transfer will require a more detailed picture of the full network of reactions involving biomolecules of all kinds at the time of the origin of life. The template-based mechanism presented here is not in conflict with other studies showing transfer of chirality from one molecule to another, whichever the direction of transfer, and it is quite possible that more than one mechanism jointly contributed to the creation of a biology in which several kinds of chiral biopolymer are inter-dependent. Nevertheless, we maintain that templating must have been an essential part of this process. Life requires the copying of sequence information that occurs via the template-directed synthesis of complementary strands of nucleic acids. A mixture of small molecules (whether chirally biased or not) that did not possess a mechanism of sequence replication would not yet constitute life, in our view. Templating is a straightforward mechanism that 'purifies' biopolymers at the level of chirality, chemical composition, and regioselectivity, as we have shown. The point of this paper is to show that templating alone can provide a mechanism for the emergence of all these properties, even in absence of other sources of asymmetry. If other sources, or other mechanisms, exist, then templating magnifies these effects to a much greater extent.
The mechanism for the emergence of RNA discussed in the present paper is an example of what we have termed chemical evolution [33]. The main distinctions between chemical evolution and biological evolution, as defined in [33], are (1) that, for chemical evolution, the major part of the sequence diversity on which natural selection acts is provided by random polymer synthesis, rather than by mutations occurring during replication of previously existing sequences; and (2) that selection is acting on physicochemical properties possessed by all short oligomers (like ligation and hydrolysis rates) rather than on encoded function of long sequences (such as ribozymes) that are only possessed by a small number of sequences in a large sequence space. If RNA can emerge directly from a prebiotic mixture by chemical evolution, as envisaged here and previously [15,16], then it is not necessary to consider more complex pathways requiring the evolution of biological function in some pre-RNA polymer and the subsequent transfer of sequences and function to RNA, as has sometimes been proposed [34].
The monomer selection problem is probably the most complex and least understood of the three problems considered in this paper. It is clear that the real world is much more complicated than the simple model we have studied. For example, hybrid duplexes of RNA, DNA, and TNA were studied experimentally [17], and it was shown that the details of the nucleotide sequences and backbone structures significantly affect the melting temperatures (T m ) of helices, and presumably also affect the rates of polymerization, hydrolysis, and ligation. Table 2 of [17] illustrates how complex this problem is. There are 14 cases where the hybrid duplex has a lower T m than the uniform duplexes of both polymers, 8 cases where the hybrid has a T m in between the two uniform duplexes, and 2 cases where the hybrid has a T m higher than both the uniform duplexes. The latter two cases are surprising, but the effect only works one way round: if the sequences are switched, the hybrid has a lower T m than both uniform duplexes. Although the thermodynamic details are rather complex, there is a general tendency to favour uniform duplexes, which is the essential point that we assume in our model. The case where both uniform duplexes are good templates and the hybrids are weak corresponds to the case studied here. There is an approximate symmetry between the two uniform polymers that is broken when the template-directed reaction is fast. If the hybrid is intermediate between the two uniform duplexes, then there will be a straightforward selection for the better of the two uniform systems. The monomer selection problem does not require symmetry breaking in that case.
We hope to continue this work to consider important details not included in the present paper, including differences in thermodynamic properties and rate constants between different base sequences with the same backbone, and adding separate steps for helix formation and melting, rather than combining them into a single effective ligation step. The simple model presented here is sufficient to demonstrate the similarity between the problems of chirality, monomer selection, and regioselectivy, and to establish the relevance of symmetry breaking transitions in all three cases.
To illustrate the shape of the strand length distribution, it is useful to plot the fraction of nucleotides in strands of length n-this is f n = nC n /C tot . Figure A1a shows the theoretical curve calculated from Equations (A2) to (A4), together with simulation results. The fact that these are equal confirms that our simulation results are accurate in the case we can calculate analytically. Figure A1b shows the length distribution for k lig = 20, which is the largest rate considered in Figure 4. The addition of templating shifts the length distribution towards longer lengths. However, the length distribution always decays exponentially for long lengths, even when k lig is high and when l max = ∞. The shapes of the length distributions in similar models to ours that include templating have also been considered by other authors [36,37]. confirms that our simulation results are accurate in the case we can calculate analytically. Figure A1b shows the length distribution for klig = 20, which is the largest rate considered in Figure 4. The addition of templating shifts the length distribution towards longer lengths. However, the length distribution always decays exponentially for long lengths, even when klig is high and when lmax = ∞. The shapes of the length distributions in similar models to ours that include templating have also been considered by other authors [36,37].