2.1. Definition of Coding Relation
Here I use the term “coding” in a broad sense, not just as a correspondence between triplets of nucleotides in mRNA and aminoacids in the protein, but as the ability of living systems to store recipes on how to perform various functions from cellular metabolism to complex animal behavior [12
]. For example, epigenetic stability and neural memory represent the short-term storage of encoded functional information, whereas heredity represents the long-term storage of information which extends over many generations [14
]. Stability of encoded functional information is supported by the organism and is implemented as a conditional autocatalysis of CEs. The definition of autocatalysis as a reaction, whose rate is positively affected by one of the products, is not sufficient to characterize CEs. CEs are irreplaceable, which means that they cannot be recovered if all copies are destroyed. Thus, CEs cannot be captured easily from the environment and require strong autocatalysis, whose rate is zero in the absence of products. CEs are products of the autocatalysis and should not be confused with resources (e.g., precursors), which are reagents. Resources are simpler and more abundant molecules than CEs, whereas CEs are either rare molecules or do not exist in the inanimate world. Genes are examples of polymeric CEs, which are duplicated via template-based copying process. In contrast, primitive CEs are relatively simple molecules, which catalyze their own synthesis, and therefore produce copies without template matching. Both mechanisms of copying are strict autocatalytic reactions because an existing CE is required for generating copies. Association of copying with autocatalysis may seem confusing because humans can copy almost anything. However, in living organisms, very few kinds of molecules can be copied even with external help; and at the time of life’s origin, this external help was minimal.
CEs are produced via either autocatalytic reaction or autocatalytic cycle. In the latter case, the reaction rate is positively affected by products generated after several downstream reactions rather than by the immediate products. Molecules at all phases of the autocatalytic cycle are copies of the same CE because they produce each other following reactions of the cycle. Both types of autocatalysis can be formally represented by a Petri net with specific properties [15
]. It appears that each self-reproducing system with these properties has a unique minimum core which is a cycle, where each component can produce any other component from that core [15
]. Thus, each kind of CEs can be viewed as a minimum self-reproducing system.
Theoretically, all self-reproducing systems can support their existence. However, the viability (i.e., persistence) of such systems depends on additional external and internal factors, including abundance and accessibility of resources, degradation and diffusion rates, and time delays. These factors are inter-dependent. For example, higher rates of degradation or diffusion require abundant resources, which can support an increased rate of catalytic synthesis. Temporal changes in resource abundance may also affect the viability of the system.
I define coding as a triadic relation between an organism A, its CE (which may be present in multiple copies), and function F (Figure 1
), where (1) CE is a viable and irreplaceable self-reproducing system within organism A; (2) CE induces or modifies some function F (i.e., behavior/development) of the organism A; (3) survival and/or reproduction of A is supported by function F. Both CE and F are parts of A as shown in the figure, thus the coding relation unites two levels of structural-functional hierarchy. We can say that CE encodes F for organism A. Evolutionary advanced organisms can utilize coding relations at multiple time scales. For example, an epigenetic state of a specific DNA region represents a short-term CE whereas genes are long-term CEs. The coding relation has a circular structure with a positive feedback because CE encodes the function of the organism, which in turn supports self-reproduction of the organism, and the organism supports self-reproduction of the CE (Figure 1
). Thus, it can be associated with the concept of semantic closure which in brief means a self-supporting interpretation of the code [16
The relation between CE and F is more than just a cause-effect relation because it does not exist without organism A. It can be viewed as a semiotic relation which emerged within evolving organisms to support (i.e., “memorize”) their specific function. This relation is contingent or arbitrarily defined because the same function can be encoded in many different ways, but only one kind of CE (usually the one that appears first) is preserved and then carried through branching phylogenetic lineages. Later, the same CE can be reused for different novel functions. Studies of biological evolution have long been limited to one or two components of the code relation. Paleontology and taxonomy are focused on the organism component; genetics studies the relation between CEs (genes) and organism morphology; and evolutionary biology links functional characteristics (e.g., embryonic development and behavior) with organism’s morphology. However, all components of the code relation are equally important, and each may take a leading role in a specific episode of evolution [14
]. For example, mutations of the DNA sequence is not the only driving force of evolution. Changes may start from a modified interpretation of the same DNA sequence in a changed environment or from alternative functions of a body part (e.g., Baldwin effect) [18
]. However, all changes have to be encoded in order to persist through generations.
If some function is needed only at a specific stage of organism development or in a specific environment, then an organism can interpret the code for this function on demand when these conditions are met, and shut down interpretation at other times. Interpretation on demand can be illustrated by gene regulation at multiple steps including transcription, mRNA stability, translation, protein modification and localization. Interestingly, the ability to regulate the interpretation of a CE is encoded by other CEs. For example, transcription of hormone-responsive genes is regulated by hormone receptors, which are encoded by other genes.
The proposed definition of coding in living organisms challenges the idea of von Neumann that the genome is a description of an organism [19
]. A code is descriptive if its meaning can be recovered (i.e., computed) from the meaning of its parts (e.g., characters, words) following known syntax rules. However, the code is not fully descriptive because it cannot be partitioned indefinitely and the meaning of parts at the lowest level is not described. This is related to the idea of Kampis that life is not computable [20
]. CEs do not fully describe or compute organisms or their parts but encode methods for their production. Similarly, a recipe book helps to bake the bread without describing the final product. A recipe is based on non-descriptive indexes to components (e.g., take a cup of flour and a cup of water...). Similarly, CEs in living organisms may provide non-descriptive indexes by selective binding to specific molecules, or by producing other cellular components capable of selective binding. The advantage of a non-descriptive code is that it does not need to carry all the information, and therefore, it can be much smaller than the full description of an organism. This may seem as a quantitative difference, but it appears essential for the origin of life. If the code of first living organisms were descriptive, it would be as complex as the organism itself and would never emerge spontaneously. However, non-descriptive coding can start spontaneously from a single CE.
2.2. Emergence of Primordial Code-Driven Functional Systems
The first code-driven functional systems were in many respects similar to contemporary organisms: they used resources, reproduced, and encoded chemical functions that had a beneficial effect on their persistence and reproduction. Only the number of encoded functions was extremely small and most likely life started from a single function and a single kind of CEs. Emergence of the coding relation can be viewed as a start of the long process of life’s origin.
Although the chemistry of first CEs is unknown, it is reasonable to hypothesize that they were similar to contemporary co-enzymes. They could not be freely available in the environment as resources because CEs are irreplaceable by definition. Thus, they had a more complex chemistry than saturated hydrocarbons which are most abundant organic molecules and can be synthesized abiogenically [21
]. Initial CEs had to be active catalysts because they needed to support (1) auto-catalysis, i.e. assembly of their copies from precursors, and (2) encoded function within a larger host system. Passive CEs could not appear first because there was no interpretation system yet. Because primordial CEs should be treated as evolutionary predecessors of contemporary nucleic acids, we expect that some of them were similar to nucleic acids. Among simple organic molecules in the cell, coenzymes (e.g., NADH, ATP, CoA) are most similar to nucleic acids; thus, following Wächtershäuser [10
] and Danchin [23
], we can hypothesize that first CEs were coenzyme-like molecules (CLMs) which were common ancestors of both nucleic acids and contemporary coenzymes. By a CLM, I mean a small non-polymeric organic molecule with catalytic activity.
Let us now reconstruct possible components of the first code relation. The role of organisms can be played by oil microspheres in a water environment. By oil I mean liquid hydrocarbons which can be present on early terrestrial planets [24
] and can be synthesized abiogenically in hydrothermal vents [21
]. Hydrocarbons aggregate and form microspheres in water, making discrete resources that can easily support long-persisting catalytic networks. Oil microspheres can be broken mechanically into smaller parts or merged with other microspheres. In this way, catalytic networks on the surface of oil microspheres can propagate and capture new resources. The last but not the least reason for selecting oil microspheres as candidates for proto-organisms is that hydrocarbons provide a link to the future origin of bilayer lipid membranes, as discussed below. In comparison, the “lipid world” model of the origin of life [25
] assumed that first proto-organisms were lipid micelles or bilayer vesicles. However, this assumption seems unrealistic because lipid molecules cannot be synthesized abiotically in sufficient quantities. Even simple amphiphilic molecules (e.g., fatty acids) were rare molecules compared to hydrocarbons before the origin of life. Thus, they were diluted by hydrocarbons and unable to form micelles or bilayer vesicles.
The function of first CEs anchored to the surface of microspheres could have been catalytic oxidation of terminal carbon atom in hydrocarbon molecules. Because oxidized terminals became hydrophilic and negatively charged, this process altered the surface properties of microspheres, which in turn could positively affect the rate of autocatalysis of CEs. This positive feedback mediated by the properties of a larger system is the essential feature of the coding relation. Potential mechanisms of the positive feedback may include providing binding sites for new CEs and/or selective absorption of energy-rich chemicals or precursors of CEs. Simple autocatalysis is abundant in non-living nature, but it does not represent a coding relation because it has no arbitrariness or choice. Thus, besides autocatalysis CEs have to encode some function of a larger system (an organism) which would increase the success in survival and reproduction. A similar mechanism of a positive feedback through the alteration of the environment was previously proposed by Shenhav and Lancet [26
]. Autocatalytic synthesis of CEs together with rare accidental collision and breakage of microspheres can lead to the propagation of the encoded properties through the entire population of microspheres. It is unlikely that any of presently known coenzymes was the CE in the first living systems. We also do not know the method of their anchoring to the oil microsphere. Possible mechanisms of binding may include positively charged groups (e.g., amino groups) attracted to negatively charged fatty acids on the surface of a microsphere (fatty acids may be products of CLM activity as described below). Alternatively, CLMs can be anchored via covalent bonds.
Oxidation of hydrocarbon terminals can be viewed as the first evolutionary step towards the emergence of bilayer lipid membranes. To make a lipid-like molecule, hydrocarbons need to be oxidized and then linked via glycerol-like molecule. If hydrocarbons were already oxidized due to the first code relation, then they became ready for the next step of making lipids, which helped to stabilize the surface of microspheres. Although glycerol can be produced abiotically, it is relatively unstable; thus, it is not clear if it can be accumulated in sufficient quantities to support lipid synthesis. Alternatively, glycerol can be produced on the surface of proto-cells by oxidation of hydrocarbons. This scenario is supported to some extent by the existence of hydrocarbon-eating bacteria [27
It was suggested that simple autocatalytic networks can get established on mineral surface [10
] or in porous walls of hydrothermal vents [3
]. According to Wächtershäuser, fatty acids and lipids are synthesized as by-products of surface metabolism, which is supported by carbon fixation [10
]. Although these scenarios are theoretically possible, they have several problems. First, selective accumulation of negatively charged organic molecules on positively charged mineral surface is not similar to natural selection and does not drive the evolution of metabolic networks, as claimed by Wächtershäuser [10
]. Selective accumulation belongs to the non-living world and is similar to crystallization, whereas life starts with a coding relation. Thus, evolution requires an autocatalytic CE, which can modify the properties of the mineral surface so that it becomes more favorable for the autocatalysis. However, Wächtershäuser did not describe CEs, which can propagate on the mineral surface. Second, it is easier to produce fatty acids by oxidation of hydrocarbons than by carbon fixation, which requires a multi-component autocatalytic cycle similar to the reverse citric acid cycle [5
]. It is unlikely that catalysts of all steps of such a cycle would appear in close proximity to each other to initiate the autocatalysis. Third, CEs have more freedom to move on the liquid surface of oil microspheres than on the solid surface of minerals. This mobility increases the rate of autocatalytic synthesis and makes the network more viable. Fourth, metabolic networks on mineral surface or in pores cannot easily invade distal niches separated by a gap or wall. On one hand, the reassembly of the carbon fixation cycle in a new location from multiple diffusing components is unlikely because the diffusion rates should be low (otherwise, the autocatalytic network is not viable). On the other hand, dispersal of the whole network on the surface of mineral particles is limited because particles are heavy and sink. In contrast, dispersal of autocatalytic networks on oil microspheres is not restricted by these factors.
One of the premises of the porous wall model is that inorganic crystals in the walls can catalyze some reactions within the network. However, oil microspheres also can carry metal ions or inorganic particles on their surface (e.g., via negatively charged carboxyl groups in fatty acids); thus, both hypotheses are equal in this respect. But inorganic particles are not likely to play the role of CEs, as it was proposed in the “genetic takeover” hypothesis [28
]. Although functional information can be transferred to a different physical carrier, such transfer is possible only at a higher level of biological organization. According to the RNA world scenario, information transfer from nucleic acids to proteins started long after the origin of life, when organisms became sufficiently complex. Thus, the hypothesis of information transfer from crystals to organic molecules at early stages of life evolution (i.e., genetic takeover) is not convincing.
2.3. Combinatorial Coding
Hereditary system which is based on transferring of multiple types of simple (i.e., non-polymeric) CEs between parent and offspring will be called “combinatorial coding” because CEs are not connected, and hence, are transferred to offspring organisms in different combinations. Despite of random transfer of CEs to offspring organisms, combinatorial coding can be stable because (1) CEs are present in multiple copies and therefore each offspring organism has a high probability to get the full set of CEs, and (2) natural selection preserves preferentially organisms with a full set of CEs. The efficiency of the later mechanism was shown in a “stochastic corrector model” [29
New types of CEs can be added either by acquisition of entirely new code relations, or by modification of existing CEs and corresponding code relations. It is conceivable that oil microspheres with a single type of CEs can accidentally acquire another type of CE, which encodes an additional function of the system. However, this should be viewed as a rare event because very few molecules can combine autocatalysis with additional functions within a larger system. Additional CEs have to encode novel functions to persist within primordial systems. For example, they may enhance the ability to capture energy or facilitate attachment to some substrate with beneficial consequences. The functional network of these microspheres may grow by adding new CEs and corresponding new functions. Systems with combinatorial coding have an increased evolutionary potential compared to systems with a single CE because different combinations of CEs may easily generate novel effects.
The next step towards increased evolutionary potential was transformation of old CEs into novel ones. Obviously, not every modification makes a new CE because it is necessary that the new molecule remains autocatalytic and encodes a new function. However, there may be a common core in chemical reactions, which support both the autocatalysis and the novel function. For example, it may happen that accidental methylation of the parental CE at a certain position makes a new catalyst capable of methylation. Then, this modified molecule becomes a new CE because it is capable of autocatalysis via methylation of the parental CE, and also it encodes methylation of other molecules, which is a new function. Derived CE may initially depend on the presence of the parental CE and this dependency sets the limits of recombination of CEs. However, eventually the new CE may become independent from the parental one if it acquires an alternative way of autocatalysis. Transformation can substantially increase the variability of CEs and increase the number of supported functions.
Combinatorial coding can eventually lead to the emergence of synthetic polymers. For example, if a new CE, C, can catalyze the polymerization of another CE, A, then together they encode long polymers AAAAA..., which can cover the surface of the microsphere and substantially modify its physical properties. If C can catalyze polymerization of multiple monomers (e.g., A and B) then repetitive (ABABABAB) or random (ABBAABABAAABB) sequences can be produced. These kinds of polymers may show more advanced properties including 3-dimensional folding and secondary structures. Primordial cells with polymerized CEs can substantially increase their functional repertoire, however they still have a limited hereditary potential [30
] because of their inability to copy polymers with arbitrarily defined sequence. Thus, all encoded polymers at this evolutionary step were either simple repeats or random sequences. This limitation was removed with the emergence of templatebased replication, which is discussed below.
2.4. Origin of Universal Coding
Initial steps of primordial evolution were extremely slow and inefficient because self-production of each CE had to be developed from scratch. There was no universal rule for producing new CEs. Some improvement was achieved by transformation of old CEs into novel ones via modification of functional groups or via polymerization. In this case, at least some steps of self-production of transformed CEs could be adopted from parental CEs. However, there was still no streamlined procedure for making new CEs.
Template-based (or digital) replication is a special case of autocatalysis of CEs, where each CE is a linear sequence made of a few kinds of monomers, and copying is done sequentially via pre-defined actions applied to each monomer [29
]. In theory, it may be possible to generalize the notion of template-based replication to non-linear polymers (e.g., two-dimensional sheets or branching trees), but for simplicity I will consider only linear polymers. Digital replication makes the coding system universal because the replication algorithm works for any sequence. Thus, there is no need to invent recipes for copying new or modified coding molecules. However, the notion of universal coding should be interpreted with caution because true universal properties exist only in mathematics. In the real world, there are always some limitations even within a “universal” coding system. For example, molecular machinery which is sufficient for replicating short DNA fragments (200–1000 bp) may not work for long sequences (e.g., >1 Mb). Mammalian chromosomes have extremely long DNA sequences (e.g., 200 Mb) which cannot be preserved and replicated in-vitro
; however, mammalian cells can perform this task. Replication and elongation of telomeres also requires additional mechanisms that are not equivalent to simple template-based copying.
The starting point for the origin of template-based replication is the existence of polymeric CEs with either random or repetitive sequence. Polymers may initially stick to each other to perform some other functions (e.g., to increase stability and facilitate polymerization). Then, the shorter strand of the paired sequence can be elongated by adding monomers that weakly match to the overhanging longer strand. This step can be described as non-identical replication [31
]. Then natural selection supported the increase of specificity of replication because it helped to produce better copies of existing polymers. Template-based replication probably started with copying short sequences with regular repeats, but eventually progressed into complex repeats and entirely aperiodic sequences. Invention of digital replication was the turning point in the origin of life which supported unlimited hereditary potential [30
] and caused a rapid increase in the abundance and complexity of CEs. First replicating polymers were probably similar to nucleic acids; however, the sugar-phosphate backbone does not support anchoring of nucleic acids to the surface of oil microsphere. In contrast, peptide nucleic acids (PNAs) with a pseudopeptide backbone are able to absorb at the lipid-water surface [32
]. Nelson et al
. consider PNAs as possible evolutionary predecessors of nucleic acids [33
Major steps in the transition from the “coenzyme world” to the “RNA world” are summarized in Figure 2
. Life started from the emergence of single CEs on the surface of oil microspheres; then additional CEs appeared either by capturing new autocatalytic molecules or by transformation of already existing CEs. This step of combinatorial coding was followed by polymerization of CEs and emergence of template-based replication. Complexity of the structure and function of the cell increased in parallel with the evolution of CEs. An important step of this evolution was the transition from the metabolism on the outer surface of oil microsphere to the internal metabolism. Oil microspheres with amphiphilic outer layer (micelles) can spontaneously form liposome-like structures with internal water cavities. These cavities are beneficial because they increase the ratio of outer surface used for metabolism to the amount of hydrocarbons, which is a limiting resource. Thus, any encoded changes that helped to stabilize the liposome structure were favored by selection. This evolution may have resulted in the emergence of the internal amphiphilic layer of lipid-like molecules together with ion transfer mechanisms to control the osmosis. Functional molecules started migrating from the outer surface to the inner surface, including CEs. Because the inner space was enclosed, cells started accumulating free-floating resources inside the cell, which eventually resulted in the formation of a cytoplasm. The transition of metabolism from the cell surface to the cytoplasm marks an important step towards contemporary cell architecture.