A Chemical Engineering Perspective on the Origins of Life

Atoms and molecules assemble into materials, with the material structure determining the properties and ultimate function. Human-made materials and systems have achieved great complexity, such as the integrated circuit and the modern airplane. However, they still do not rival the adaptivity and robustness of biological systems. Understanding the reaction and assembly of molecules on the early Earth is a scientific grand challenge, and also can elucidate the design principles underlying biological materials and systems. This research requires understanding of chemical reactions, thermodynamics, fluid mechanics, heat and mass transfer, optimization, and control. Thus, the discipline of chemical engineering can play a central role in advancing the field. In this paper, an overview of research in the origins field is given, with particular emphasis on the origin of biopolymers and the role of chemical engineering phenomena. A case study is presented to highlight the importance of the environment and its coupling to the chemistry.


Introduction
How life evolved from chemicals on the early Earth is an open scientific question and an area of active research.If this process-the transition from non-life to life-can be understood, then the underlying design principles could also be applied to engineer materials and systems with a robustness and adaptability approaching that of biological systems.It is difficult to determine exactly what were the conditions on the early Earth, and certainly they would have been dynamic and spatially heterogeneous.However, constraints have been established by the scientific community, such as the period of time between the formation of the Earth (4.5 billion years ago) and the first fossil evidence of single-cellular life (3.5 billion years ago) [1].Further constraints provide scientists with guidance on "plausibly prebiotic" conditions during this time, such as the composition of the atmosphere [2], the ocean [3], and appearance of dry land [4,5].
In modern biology, nucleic acid polymers store information, and this information is read by the ribosome to produce proteins; proteins then catalyze chemical reactions.Because the ribosome is highly conserved across all life forms, its emergence has even been identified as the "universal ancestor" of all life on Earth [6].In addition to RNA, proteins are an integral component of the ribosome, suggesting a "chicken and egg" problem: which came first, proteins or nucleic acid polymers [7]?In modern biology, the polymerization processes for both polymer types are accomplished with complex enzyme machinery, but on the early Earth such protein catalysts would not have been available.However, the RNA World hypothesis provides a potential solution to the "chicken and egg" problem.
The idea of catalytic RNA (ribozymes) [8][9][10] and its subsequent discovery in 1982 [11] spawned the RNA World hypothesis, in which RNA served both functions (information storage and catalysis) in early life, with DNA and proteins appearing later in the evolutionary process [12,13].Alternative pre-RNA world scenarios have also been proposed, where an even earlier genetic system eventually evolved into RNA [14][15][16].However, robust and high-yield prebiotic routes to the non-enzymatic polymerization of RNA (or pre-RNA) continue to elude the origins of life community.The origin of life was most likely a complex system, not having a single component that evolved to become complex, but rather involving cooperation and co-evolution of multiple components [17], including small molecules [18,19], proteins [20,21] and lipids [22,23].
Amino acids, nucleotides, sugars, and lipids are the building blocks of life.These first three components react and polymerize to form proteins, the nucleic acid polymers of DNA and RNA, and polysaccharides.These biopolymers are then encapsulated in self-assembled lipid membranes to form cells.Although polysaccharides are central to living systems, they have not been studied much in a prebiotic context [24].Sugars are described in this manuscript only in the context of ribose for DNA and RNA, but there remains much research to be done on the prebiotic origins of polysaccharides.The focus of this paper is on the formation of peptides (short proteins) and nucleic acids polymers (DNA and RNA).The chemical structures of the monomers, polymers, and their assemblies are given in Figures 1 and 2.
A possible source of protein monomers was elucidated by Miller and Urey in the 1950s [25], who established a path from simple prebiotic molecules to amino acids [26].In their seminal experiment, published in 1953, an environment containing water, methane, ammonia, and hydrogen sulfide was heated and then subjected to an electric spark.Amino acids constituted 2% of the carbon, and sugars were also formed.Nucleobases were not identified in these reactions, but adenine was later found to be formed in concentrated solutions of hydrogen cyanide [27], and all four nucleobases of RNA have since been produced in various prebiotic environments, including in neat formamide with UV irradiation [28] and in aqueous aerosols [29].Over the years, the scientific consensus on the atmosphere of the early Earth has changed, suggesting that Miller's conditions may have been too reducing.However, amino acids and nucleobases have since been produced under a wide range of "prebiotically plausible" atmospheric conditions [30].The formation of the monomers does not imply the formation of their polymers, which has its own unique set of challenges in a prebiotic context.All three types of biopolymers are formed via condensation polymerization, producing a single water molecule per bond formed.Water is the solvent of life, and geologists predict that a significant ocean volume existed on the prebiotic Earth [3].However, polymerization reactions in aqueous media drive the polymerization reaction toward the reactants, via hydrolysis, since water is a product of the reaction.In origins of life research, this is referred to as the "water problem" [31].Dilution is another challenge for prebiotic polymerization, due the large volume of the ocean and the relatively small amount of monomer that would have been present.An additional hurdle to non-enzymatic polymerization is the fact that the thermodynamics of bond formation (amide bond in proteins, phosphodiester bond in nucleic acids) are energetically unfavorable, making high yields difficult to obtain at equilibrium conditions.Living systems are open systems, driven far from equilibrium by inputs of mass and energy, and biopolymers are held in non-equilibrium states by the energetic input of ATP (adenosine triphosphate).Similarly, the non-enzymatic prebiotic synthesis of biopolymers must be understood in this open non-equilibrium context [32,33].
To overcome the challenges of prebiotic biopolymer formation, some researchers have proposed alternative chemistries, yielding "protobiopolymers" that over time would have evolved into the biopolymers of modern life.For example, Miller proposed that peptide nucleic acid (PNA) could have been a potential precursor to the phosphodiester bond of RNA and DNA [34], which eliminates the need to form the high-energy phosphodiester bond.Orgel suggested that the ester bond may have preceded the amide bond in peptides and proteins [15].Although the ester bond is less stable than the amide bond, this could actually have been beneficial in the early stages of life, preventing the irreversible formation of the cyclic dimer, a thermodynamic "dead end" [35].
Chemical reactions are central to the origins of life question, but chemistry alone cannot provide the answer.Whether or not the original polymers were "alternative" or not, the chemistry cannot be considered independent from the environment.To fully address the challenges of non-enzymatic polymerization, hydrolysis, and dilution, the chemistry and the environment must be considered together as a system.Non-aqueous solvents may have concentrated monomers while also excluding water.Mineral surfaces such as iron sulfide or clay may have served as catalysts, and also as concentrating agents.In fact, an inorganic start to life was proposed by Cairns-Smith [36], based on information storing minerals such as clay or mica.Dynamic environmental conditions would necessarily have been present on the prebiotic Earth [37]; periodicity driven by day/night, tidal, or seasonal cycles may have driven polymerization under non-equilibrium conditions, prior to the advent of the energy storing molecule adenosine triphosphate (ATP).
The concept of a prebiotic soup, suggested by Darwin as a "warm pond" [38], reflects the chemical complexity of the early Earth, but it does not explicitly highlight the spatial heterogeneity that would have also been present across the Earth.Phase separation is central to many origins of life studies.Even the early Miller-Urey experiments exhibited multiple phases, yielding a solid "tholin" product that has never been fully characterized.Phase separation has also been proposed as an early precursor to cellular compartmentalization [39].The transport of fluid, mass, and heat in bulk and along surfaces drives many of the prebiotic environments proposed: water must be evaporated in wet/dry cycling environments [40], and thermal gradients drive replication via convective flow in porous media [41].Such phenomena lie squarely in the field of chemical engineering.
Moreover, the concepts of optimization and feedback are central to the subsequent stages in the origins of life, and more broadly to chemical evolution.With an environment and chemistry that can support biopolymer (or protobiopolymer) formation, a combinatorial explosion of polymer sequences could be formed.Modern biology employs 20 amino acids and a four-letter nucleic alphabet, although these monomer sets may have been somewhat different during the early stages of evolution [42].In fact, the current genetic code for amino acid incorporation into proteins appears to be near optimum for error minimization [43].This is the ultimate "genetic algorithm," and the inspiration for directed evolution.During an evolutionary process in the prebiotic soup (actually a heterogeneous and multiphase "stew"), protein selection may have been driven by stability rather than catalytic function; since folded proteins are both more stable and more functional, over time proteins with catalytic function may have been amplified in the population [44].
Feedback is another concept central to many models of replication and amplification of nucleic acid polymers, or their earlier predecessors.Regulation through negative feedback naturally occurs as species are accumulated and subsequently bind to each other-inhibiting the formation of additional product.Modern biology uses feedback loops extensively to regulate conditions, creating robust function in the presence of an uncertain environment.As primitive life evolved, such regulatory loops would confer further advantage at the system (e.g., cellular) level.In addition to negative feedback, Eigen proposed a model of autocatalysis, in which sequences with the highest replication rates will dominate the population through positive feedback [45].However, the selection is only for replication, not for any beneficial function.While this model represents "survival of the fittest," with a single winner in the end, Eigen further proposed the evolution of a hypercycle, in which multiple sequences can coexist in a cross-catalytic network, to achieve higher-level function at the system level.Experimentally, cross-catalytic networks have been demonstrated in RNA [46,47] and peptide [48] systems.Although Eigen does not identify what is the chemical nature of his replicator, his model is most consistent with RNA and the RNA World hypothesis, since RNA has the capability to both store information and to catalyze reactions (as a ribozyme).Subsequent modeling of autocatalysis has made an explicit connection to the RNA World [17,49].However, the origin of life was undoubtedly a more complicated system, not consistent with a reductionist perspective and a "silver bullet" single answer.
A transition in evolution occurs when a collection of simpler components begin to act as a system, so that selection moves up to the system level.Examples outlined by Maynard-Smith and Szathmary [50] also include the transition from single-cellular to multicellular life, and the evolution of human language.Similarly, cooperation among diverse biopolymer types was required in the emergence of the first cell.While proteins and nucleic acids might originally have evolved separately, as RNA and Protein Worlds, the point in time at which they began to cooperate as the ribosome has been described as the origin of life on Earth [6].
Over the years, the scientific community has attempted to define "life" [51], and a modern commonly used working definition is "a self-sustaining chemical system capable of Darwinian evolution [52]."However, with only one example of life in the universe available, ambiguity remains [53].Despite the debate about the exact point of transition from chemistry to biology, research into the evolution of chemicals on the prebiotic Earth is an active area of scientific research-and the topic of this paper.
A number of comprehensive review papers have been written on origins of life chemistry over the years, for example, [7,17,20,[54][55][56][57]. The purpose of this paper is to provide an introduction to the field for chemical engineers, with an overview of key concepts and findings, while particularly highlighting the chemical engineering phenomena-not only chemical kinetics and thermodynamics, but also transport, phase separation, optimization, and control.While the origins of life field is broad and includes substantial research from e.g., geologists and physicists, this contribution is focused around the chemical origins of biopolymers on the early Earth.Work by the authors and collaborators is highlighted to some extent, as it represents a more chemical engineering approach, but the intent is to cover the key papers and discoveries in the field.The background section is divided into three sections: monomers, polymers, and assemblies, as illustrated in Figure 3.

Monomers
The first step toward uncovering the chemical origins of life is to understand the source of the monomers that polymerize into proteins and nucleic acid polymers.The exact conditions on the prebiotic Earth are not known, but geologists have identified likely chemical species and environments, based on constraints from mineral evidence and geochemical principles.Hydrothermal vents deep in the ocean are one possible environment [58,59], as well as air-water interfaces on the ocean surface [60] or the surface of aerosol particles [29].Exposed dry land emerged before 4.3 billion years ago [4,5], prior to the origin of life, providing dry environments that could more efficiently promote condensation reactions.The ocean-land interface may also have provided a conducive environment, in which tidal pools cycle between hydrated and dehydrated states, alternatively providing mixing, concentration, and dehydration.UV irradiation is also expected to have played a significant role in prebiotic chemistry.The ozone layer was not yet developed on the prebiotic Earth, so significant UV irradiation may have contributed to photocatalysis [28,61] as well as photodegradation [3].Mineral surfaces may have also catalyzed early reactions [62] prior to the emergence of protein catalysts.In addition to thermal energy, reactions could have been driven forward (and backward) by external energy sources such as electrical discharges [25] and cosmic rays [63].Lipids can also be generated under similar environments [64,65], providing a pathway toward early cellular compartmentalization.The starting material on the prebiotic Earth is typically delineated beginning with simple stable molecules found throughout the universe, including water, molecular hydrogen, molecular nitrogen, carbon dioxide, carbon monoxide, ammonia, and methane.These molecules may then react to form slightly more complicated molecules, such as hydrogen cyanide (HCN), formaldehyde (HCHO), and glycine (NH 2 CH 2 COOH).During the original Miller-Urey experiments in the 1950's, amino acids (including glycine) were generated using water, molecular hydrogen, methane, and ammonia [25].See Figure 4.However, methane and ammonia are unstable under high UV irradiation, so this particular environment may not be plausible.More recent research focuses on less reducing environments, such as a two-phase system with water, molecular nitrogen, and carbon dioxide over liquid water [60].The nature of the chemical starting material also feeds back to the thermodynamics and kinetics of all aqueous-phase reactions, by altering the pH of the ocean (e.g., amount of dissolved carbon dioxide); the pH of the prebiotic ocean is thought to have been slightly acidic [3].
How did these simple chemicals react to form the small-molecule building blocks of life?-amino acids, nucleobases, sugars, and lipids.While the exact pathways may never be known, many pathways can certainly be excluded based on prebiotic constraints, leaving a smaller number of plausibly prebiotic candidates.The scientific question is thus "How could life have emerged?"To answer this question, the chemicals and the environment must be considered together as a system, which may also include a periodically varying environment that drives the chemical reactions.The kinetics of these reactions must be understood, as well as the long-term thermodynamic behavior.
Molecules that are unstable may be problematic for prebiotic chemistry, although the situation is really more complicated.Although the Miller-Urey experiments operated on a closed system, living systems are open, driven away from equilibrium by mass and energy fluxes.Thus, the transition to living systems on the early Earth should also be considered with an open system.Even molecules that are unstable can be accumulated if the net rate of production is positive [58].The apparent instability of amino acids under UV exposure could have been mitigated by UV absorbing molecules in the prebiotic ocean [3].However, the instability of ribose has led researchers to consider alternative, more stable sugars in proto-nucleic acid polymers [67].Stability arguments have also been employed to suggest which nucleobase pair came first: because of the instability of cytosine, the AU pair in RNA may have preceded CG [68].
The specificity of reactions is another important consideration.In general, a large number of products were produced in the Miller-Urey experiments and in most subsequent experiments from mixtures of a few simple chemicals.The formose reaction produces ribose from formaldehyde, but it also produces a large number of other products [69].Catalysts are needed for the formose reaction, and the nature of the catalyst influences the distribution of products [70].Mineral [62] as well as amino acid catalysts [70] have been used.However, none create a high selectivity toward ribose, and this remains a challenge in understanding the origin of nucleic acid polymers.
The yield is also an important factor in any prebiotic reaction pathway.If a molecule is made in only a trace amount, it may not provide a reliable source for subsequent polymerization.Miller and Urey analyzed their samples with paper chromotography [25], which limited the number of species that could be detected.However, modern analytical chemistry techniques enable many more species to be identified, if only in trace amounts.Recent analysis of previously unanalyzed Miller-Urey samples identified a dozen amino acids, as well as numerous dipeptides [66].This lack of specificity may in fact be important to generate the diversity needed for subsequent evolution-if the species are all present in significant quantity.Other species are presumably also present, such as alternative prebiotic versions of amino acids or nucleobases.In a complex mixture, it is impractical to identify all species, and but rather one searches for compounds of particular interest in the sample.Alpha hydroxy acids have been suggested as an early version of amino acids [15], and were found in later Miller-Urey experiments [71].Urazole is a heterocyclic compound that reacts readily with ribose and can be formed under plausibly prebiotic conditions [72].Although modern analytical chemistry is a powerful tool for prebiotic chemistry, still many challenges remain in fully characterizing any complex mixture representing the "prebiotic soup".
Compared to amino acids (protein monomers), the synthesis of nucleotides is much more complicated.Nucleotides consist of a nucleobase, a sugar, and a phosphate group, and are polymerized to form RNA and DNA.The glycosidic bond connecting the RNA nucleobases and sugar is difficult to form directly, but can be accomplished for some nucleobases using dry-down reactions that remove the water product [73].To overcome this challenge, Miller and co-workers [72], and later Hud and co-workers [74], proposed alternative nucleobases that are more reactive.Powner and co-workers presented an alternative route to pyrimidine nucleoside formation, using reactants that are chemically distinct from nucleobases or sugars [75].Powner and co-workers also demonstrated a one-pot system that generates both nucleosides and lipids [76].The source of the phosphorus in nucleotides has been an open question in the field, since most phosphorus would have been stored in minerals such as apatite that are unreactive.However, the mineral schreibersite could have been provided by meteorites and has been demonstrated to phosphorylate the hydroxyl group of glycerol [77].
It is also possible that the monomers of life were not generated on Earth, but rather were extraterrestrial in origin [78].Meteorite samples show significant amounts of amino acids, and interstellar ice can generate lipids [61].The late heavy bombardment period occurred around 4 billion years ago and delivered a significant total mass to the Earth [79].In fact, monomers were generated on Earth and in space-the better question is what was their more significant source.

Polymers
The polymerization of amino acids and nucleotides into proteins and nucleic acid polymers (DNA, RNA) is considered to be an early and necessary step in the origins of life.These biopolymers store genetic information (our design) and catalyze chemical reactions (how we operate).It is difficult to envision a living system without biopolymers, although alternative crystal-based living systems have also been proposed [36].Achieving biopolymers of significant length is a critical step in chemical evolution; without a sufficient length, nucleic acids cannot form the double helix that is needed for replication (6-7 nucleotides), and peptides cannot fold into catalytically functional units (≈20 amino acids).Recent work shows that assemblies of multiple shorter peptides [80] or RNA oligomers [81] can also catalyze chemical reactions, but even achieving this length is challenging under prebiotic conditions, due to unfavorable kinetics and thermodynamics [82,83].Amide and phosphodiester bonds are typically formed through condensation reactions (i.e., generating water), which is thermodynamically unfavorable in an aqueous solution such as the ocean.Catalysts are needed to increase the forward rate of polymerization, and even then the monomers often require additional chemical modification, or "activation," to achieve significant length and yield [84].For these reasons, alternative monomers and their polymers have also been proposed as pathways toward modern biopolymers.
As in the case of monomer generation, the coupling between the chemicals and the environment is critical to understanding prebiotic polymerization of biopolymers.Deep sea hydrothermal vents have been extensively studied as a site for peptide and nucleic acid polymerization.At their high temperatures and pressures, non-enzymatic oligomerization of amino acids [85] and nucleic acids [86] can proceed.However, yields are low, and much of the amino acid becomes trapped in the cyclic dimer, known as diketopiperazine (DKP), providing a thermodynamic dead end [87].Stability is an additional challenge at these high temperatures.The peptide bond is resistant to hydrolysis at moderate temperatures and neutral pH, but in a high temperature aqueous solution it undergoes rapid hydrolysis.In addition, the amino acid monomers themselves are unstable, undergoing decarboxylation.An open system with inlet and outlet mass flow and monomer generation (i.e., a hydrothermal vent) might be able to accumulate peptide faster than it degrades, although this concept and design may not be consistent with heating timescales expected in hydrothermal vents [88].The air-water interface is another potential environment for polymerization.This interface could facilitate polymerization by aligning charged and zwitterionic amino acids, as well as excluding some water [90].However, to date this approach has only been demonstrated for amino acid ester monomers, which do not generate water during their polymerization.Another environment that minimizes water contact is the hydration/dehydration cycle, which could occur at the interface between water and land.Periodic environments have been used to form peptides from amino acids, although without chemical modification ("activation") of the amino acids the yields are very low [40].Chemical activation of amino acids has been achieved using, for example, carbonyl imidazole [91], COS [92], and NO [89], as well as alumina [93].A schematic from Ref. [89] is shown in Figure 5.The prebiotic availability of carbonyl imidazole is questionable due to its high reactivity, while COS might be available from volcanic reactions [92].However, because these activating agents are consumed in the reaction (unlike a catalyst), finding a prebiotic source in high enough quantities is problematic.Activation has also been a key component of nucleic acid polymerization: organic bases have been used as activiting agents [94] and cyclic nucleotides have been used as activated monomers [95].Activated RNA was also used in conjunction with a cold environment (i.e., ice) [96].Although water is still present, at low temperatures the hydrolysis of the amide bond is reduced, and RNA up to 17-mers were achieved at 90% yield.As with amino acid polymerization, a prebiotic and robust source of activated nucleotide has not yet been established.In fact, in contrast to amino acids, a prebiotic and robust source of unactivated nucleotides remains elusive.
In addition to chemical activation of the monomers, catalysts are also frequently used to facilitate prebiotic polymerization.In modern biology, enzymes (i.e., protein catalysts) provide efficient and specific catalysts for linking amino acids and nucleotides into polymers.Prior to the emergence of catalytically functional proteins, mineral surfaces may have served this role [97].By aligning amino acids on the surface [98] the DKP trap can be overcome to form much longer peptides (e.g., 55-mers) [91].Mineral surfaces were similarly used to elongate nucleic acids.In both cases, the monomers were chemically activated, suggesting that the mineral surface alone is not sufficient to facilitate the polymerization.Another possible disadvantage of mineral surfaces is that the polymers may be difficult to release from the surface, especially as the length of the polymer grows [99].Salt addition has also been used to promote peptide formation under cycling conditions, with and without mineral surfaces [100].However, the yield and peptide length are still low.
Lipids are another potential catalyst and environment for amino acid and nucleotide polymerization.The polar headgroup of the lipid could align the monomers, while water would be excluded from this hydrophobic environment.Cationic lipids were used to facilitate peptide polymerization, although fatty acids are a more likely prebiotic candidate [101].RNA polymerization of up to 50-mers was reported in 2008 using fatty acids and unactivated polymers, although the yield and robustness of this reaction must be addressed [102].Non-aqueous small molecule solvents have also been proposed recently, as another route to exclude water from the vicinity of the reaction.Prebiotically plausible small molecules such as urea and formamide provide the components for an ionic liquid or deep eutectic solvent.Preliminary work suggests that such environments can promote phosphorylation, in the presence of phosphorus-containing minerals [103].
Due to the challenges associated with the robust prebiotic polymerization of amino acids and nucleotides, researchers have also investigated the polymerization of alternative "protobiopolymers."For example, the ester bond has been suggested as a prebiotic version of the peptide bond [15].The monomers could be α-hydroxy acids, which are found in Miller-Urey type experiments [71] and are chemically similar to amino acids (substitution of a hydroxyl group for the amine).Previous studies have shown that the ribosome is capable of forming ester bonds between activated hydroxy acids in addition to forming amide bonds between amino acids, which has led to the long-standing hypothesis that polyesters could have come before polypeptides in the early stages life [104].Although the ester bond is less stable than the amide bond, its formation is thermodynamically favorable (unlike the amide bond), and it does not form the cyclic dimer trap associated with amino acids.Polyester oligomers were formed from unactivated monomers in a drying reaction [105] and more recently in a cyclic hydration/dehydration environment [35].
Prebiotic alternatives to RNA and DNA have been extensively pursued due to the many challenges associated with the prebiotic generation of either RNA or DNA.The peptide nucleic acid (PNA) replaces the phosphodiester bond with an amide bond, and still forms a double helix [34].Although more prebiotically accessible than the phosphodiester bond, it still inherits all the challenges associated with peptide bond formation; to date only formation with chemical activation has been reported [106].The glyoxylate linkage for nucleic acids is another proposed linkage, which unlike the peptide and phosphodiester bond is thermodynamically favored [107].Nucleotide dimers were successfully linked in a drying experiment, albeit at low yield.
The previous discussion of polymerization focuses on early Earth environments, but delivery of material from outer space is an alternative hypothesis.While the extraterrestrial supply of monomers to the early Earth has scientific support, the supply of polymers by meteorites is less viable.Short oligomers of amino acids and hydroxy acids have been found in meteorites, and thus could have been delivered to the early Earth [108].However, no evidence suggests that long (i.e., functional) biopolymers would have been delivered in significant quantity.

Assemblies
Polymeric assemblies are essential for performing the fundamental tasks of life-storing and transferring information via the nucleic acid double helix, and catalyzing chemical reactions with folded proteins.In both cases, non-covalent interactions (e.g., hydrogen bonding between the side chains) create more complicated topology and structure, enabling the performance of practical tasks.However, if the polymers are not long enough, these assemblies are not stable.Nucleotide monomers of present day RNA do not base pair in water under ambient conditions.The dipeptide of phenylalanine does form fibrils in water due to the hydrophobic phenyl rings on the side chains [109], but for most peptides a longer length is also required for assembly and catalysis.Understanding the assembly of biopolymers under early Earth conditions is essential for uncovering the route to chemical evolution and the subsequent transition to living systems.

Selection
The specific characteristics of the monomer side chains determine the assembled structure of both proteins and nucleic acid polymers.There are twenty natural amino acids and four nucleic acid monomer types; within each class, the monomer type is distinguished based on its side chain.Although random polymer sequences might have formed through chemical reactions on the early Earth, not all sequences would assemble.It is this assembly that could have driven selection for functional sequences.New and Pohorille describe a selection process for proteins, in which the proteins that fold are both more stable to hydrolysis (by excluding water) and more catalytic [44].Over time, a higher concentration of catalytic proteins would emerge from a random sequence pool, due to this apparent coincidence in which stability and catalysis are correlated through folding.
Just as hydrolysis provides an early selection pressure for peptides, nucleic acid polymers assembled through base pairing might also provide protection against hydrolysis.However, the emergence of a genetic code requires the copying of sequences, such that information can be propagated from generation to generation.In modern biology, this is achieved through templated replication-the assembly of DNA into a double-stranded helix, with copying achieved via complicated enzyme machinery.Achieving this templated replication in a prebiotic, non-enzymatic context presents many challenges, as articulated recently by Szostak [110].An early RNA or DNA replicator could not copy itself with high fidelity [111], producing instead a group of similar sequences with random mutations that are collectively called a quasispecies [45].
Experimental demonstration of non-enzymatic, template-directed replication remains difficult, with successful results demonstrated for fairly short, often highly specific sequences (e.g., palindromic [112,113] or self-complementary [114,115]).Challenges include the relative rates of duplex re-assembly versus copying, the separation of mother and daughter strands after copying has occurred, and the fidelity of the copying process [110].To copy a nucleic acid duplex, the duplex strands must be separated into single strands, each of which serves as a template for copying.However, even moderately short nucleic acid duplexes are quite stable owing to Watson Crick base pair stacking interactions [116,117], so that the result of copying is a template-copy duplex that is difficult to separate-a problem known as strand inhibition [118,119].Many researchers have approached these problems by focusing on chemical modification of the modern nucleic acid structure.redIn one approach, template or fragment strands are chemically modified at specific sites so that once ligation occurs, the resulting template-copy duplex is destabilized, resulting in enhanced turnover numbers [120][121][122].However, it is debatable whether the proposed alternative backbones or base replacements are prebiotically plausible.Several studies focused on the role that physical and environmental factors, rather than chemical alternatives, could have played in non-enzymatic replication.In one approach to strand inhibition, template strands are immobilized on a solid particle support while undergoing cycles where complementary fragments are fed, ligated, washed, and product strands re-immobilized; this process resulted in desirable "exponential" replication (doubling of the template population with every round) [123].In a similar system, additional binding of downstream "micro-helper" oligomers to the template strand allowed for efficient copying of all four RNA nucleobases [124].Though not a self-sustained system, these studies emphasize the potential importance of solid-phase chemistry and periodic replacement of monomers/fragments in driving replication processes.Mast et al. proposed that replication could have been driven by thermal gradients in a tall, thin pore that is cold on one side and hot on the other [41], as shown in Figure 6.In this scenario, nucleic acid oligomers of different sizes undergo two transport phenomena: thermal convection caused by expansion of water as it heats up and thermophoresis of the oligonucleotides from the hot to cold wall.These combined effects result in an accumulation of longer oligomers at a rate exponentially dependent on length, which would be impossible in equilibrium reaction conditions.A recent study further demonstrates polymerization [125].
Heterogeneity of monomers is another key issue in the assembly of both peptides and nucleic acid polymers.In a complex prebiotic soup, many chemical variants could be present [57].For example, formation of nucleosides from bases and ribose form a mix of 2'-5' and 3'-5' ribose linkages.Although this disorder has been viewed as a major challenge for understanding the origin of the modern 3'-5' linkage, recent research shows that polymers with mixed linkages can also base pair and provide functional structures [126].Moreover, the resulting decrease in stability might have helpful in overcoming the strand inhibition problem.However, a loss of stability might also lead to higher mutation rates.Too much mutation results in the complete loss of information over multiple rounds of replication, and this critical mutation rate is known as the "error threshold" [127].
Another major source of heterogeneity is the chirality of the monomers.A major feature of modern biopolymers is the predominance of one chirality over the other, as seen in the L-amino acids and D-sugars of life today.In order to achieve assembly of modern biopolymers, homochirality is required.The transition to homochirality likely began with mirror symmetry breaking-the generation of a slight excess of one enantiomer over the other in a set of molecules-that was followed by enantiomeric amplification [128,129].The initial enantiomeric excess may have come from outer space, as enantiomeric excesses of L-amino acids have been found on the Murchison meteorite [130].Photolysis by circularly polarized light in the interstellar medium has been proposed as a means of generating this initial chiral asymmetry [131].Others have proposed asymmetric adsorption of organic molecules onto mineral or crystal surfaces with handedness, such as calcite [132], as a geochemical route for the origin of homochirality.Enantiomeric enrichment from an initial imbalance can occur through physical [133,134] or chemical [135] amplification processes.Predominantly L-handed amino-oxazolines-precursors to RNA-are formed from the reaction of amino-oxazole with glyceraldehyde at a 1% excess of the L-enantiomer in the presence of chiral amino acids [136].After the emergence of L-RNA, L-amino acids could then have determined the handedness of the D-sugars we see in life today [137].
Hud and coworkers recently proposed an alternative scenario for nucleic acid polymerization and assembly [138].Since the modern nucleotides do not base pair through hydrogen bonding in water, it is unclear how base pairing in nucleic acid polymers would have emerged.Almost 25 years ago, the groups of [139] and Whitesides [140] reported that molecules similar to modern nucleobases, including barbituric acid, triaminopyrimidine, cyanuric acid and melamine, will assemble in organic solvents through specific hydrogen bonding.Investigating these systems in water, the Hud laboratory demonstrated the formation of linear assemblies containing thousands of complementary monomers tagged with these nucleobase analogs [141].This ordering through assembly could then provide a catalyst for polymerization of protonucleic acid polymers, by aligning their functional groups.Additionally, these molecules are attractive protonucleic acid nucleobase candidates, as triaminopyrimidine forms glycosidic linkages with ribose to produce nucleosides in water, and barbituric acid, cyanuric acid, and melamine have been found in model prebiotic reactions [142,143].

Evolution
Evolution is a process of optimization, in which a system improves its ability to function within its current environment.Selection is one necessary component of evolution, and, as described in the previous section, selection of proteins and nucleic acids may have occurred through both attrition (hydrolysis) and amplification (replication).Mutation is another necessary component of evolution.While mutations to the genetic code of individual members are more or less random, the environment acts on this random population, removing some members and allowing the remaining ones to utilize environmental resources and propagate.The early evolution of RNA, proteins, and their predecessors can be framed in terms of optimization on fitness landscapes, which relate monomer sequence to Darwinian fitness [144].The modern form of RNA is understood to be optimal, with its moderate base pairing strength being "essential for the evolution of a rich diversity of nucleic acid-related biological functions" [56].
The process of evolution confers robustness at the overall population level, under a dynamic and uncertain environment.Transitions in the evolutionary process occur when the selection moves to a higher level of organization [145].Unlike engineering optimization, in which the goal is to find the global maximum, evolutionary optimization is operating on a time-varying fitness landscape defined by the environment.Just as biological organisms are optimized under evolution, molecules on the early Earth would also be evolving under a selective pressure.
As articulated by New and Porohille [44], selection for folded catalytic proteins should occur in an aqueous environment, due to the pressure of hydrolysis.However, at this point the catalysis would simply be an accident, for no particular purpose.A key transition in early chemical evolution is the formation of catalytic networks, such that selection moves to the higher level of the network, as opposed to acting on each individual polymer.Eigen and Schuster presented the concept of a hypercycle [45], which is a cross-catalytic chemical network, and formalized this concept mathematically using mass-action kinetics.More chemically detailed models have been proposed [146,147].However, several practical questions remain, such as (1) How would the hypercycle form?Must all components emerge simultaneously, and what is the probability of this event?(2) What would be the functions in the first cross-catalytic network?(3) How can an individual polymer benefit from the catalytic "fruits of its labor," prior to the emergence of cell walls?In a well-mixed system, all polymers should benefit equally.
Walker et al. recently presented a scenario and simulation to address these three questions, based on mass-action kinetics and Fickian diffusion [148].In this proposal, the polymers replicate by copying their own sequence (similar to DNA replication), and the catalytic function of the polymer is to produce more monomer.The first functional sequence produces more monomer of Type A, while a later emerging functional sequence produces more monomer of Type B. Thus, the hypercycle can be built up step by step.The first functional sequence does benefit from its own function (out-competes other sequences), but the greater benefit is realized-for both sequences-when both monomer types are catalyzed (cooperation between sequences).Although no cell compartmentalization is included in the model, the reactions occur on a surface and with limited diffusion.A region of parameter space is identified in which surface diffusion limits the mobility of the monomer, such that a functional sequence can benefit most from its product, with all sequences benefitting to a lesser extent.Although non-functional sequences are often described as "parasites" in the origins field [149], in this scenario these sequences may later evolve new functions, enabling more complex reaction networks to evolve step by step.
Although surfaces may have served an early role in localization of species, compartmentalization likely emerged early in chemical evolution.Compartmentalization plays several fundamental roles: concentrating organic molecules relative to the large volume of the ocean, providing a semi-permeable barrier with the environment, generating physical and chemical gradients which drive biological processes, and eventually coupling genotype with phenotype once compartmental survival/reproduction became functionally tied to the genetic material contained within.Oparin was the first to propose that systems of membrane-bound colloidal particles, which he called coacervates, were the first self-replicating entities [150].Since then, researchers have studied how spatial organization could have arisen through formation of both membranes [151] and microenvironments not encapsulated by membranes.
Aqueous two-phase systems have been proposed as early non-membrane systems-a mixture of polyethylene glycol and dextran in water will segregate into microdoplets rich in one organic molecule over the other.Local RNA concentration is increased by up to three orders of magnitude in this system and is accompanied by a significant increase in the RNA cleavage rate by a hammerhead ribozyme [39].Coupled mononucleotide-cationic amino acid species have also been found to form suspended microdroplets in water that sequester various organics [152].Another microenvironment that was likely prebiotically abundant is the eutectic phase of water and ice, which is formed when water with salts/metal-ions is cooled below its depressed freezing point but above its eutectic point (at which the whole system freezes).This system has been found to promote the elongation of an RNA primer on a template via single nucleotide addition [153].
Research into the formation of membranes has generally focused on short-chain fatty acids, which can be generated under prebiotic conditions (see Section 2.1), rather than the phospholipids that compose cellular membranes.It has been found that encapsulation of RNA in fatty acid vesicles generates an osmotic pressure, due to the counterions associated with the RNA, that drives the uptake of additional fatty acid material at the expense of other vesicles without RNA [154].Thus, coupling the presence of nucleic acid within a vesicle-to-vesicle growth confers a system-level advantage to the membrane-nucleic acid system, allowing it to compete with other vesicles for material.Similarly, efflux of fatty acids from a vesicle can be reduced by incorporation of higher phospholipid content; thus, the selective advantage of retaining membrane material could have led to the appearance of the less permeable phospholipid membranes that we find in life today [155].

Case Study
The coupling between the chemistry and the environment is critical to understanding the origin of functional biopolymers.Here we consider three different environmental scenarios associated with a cycling environment.The chemical system studied is the polymerization of malic acid, the α-hydroxy acid previously studied by one of the authors [35].This polyester was shown to form under alternating hot/dry and cold/wet conditions, yielding a potential proto-protein.However, the environment considered was highly idealized, with sudden switches between two different conditions.Under the hot condition of 85 o C, the relative humidity of the air was extremely dry, since it was heated from air at ambient temperature.Prior to the addition of water, the system was cooled to room temperature and capped, and then was heated back to 60 o C, a temperature at which hydrolysis is low.Monomer conversion of 60% could then be achieved [35], as shown in Figure 7.The purpose of this case study is to investigate polymerization under more realistic scenarios, in which the temperature varies gradually and the water mass transfer is coupled to the temperature.

Modeling
The simulation developed in Ref. [35] includes polymerization kinetics and mass transfer of water: R v,w = K y,w P * w x w P t = K P x w (4) k −1 = (1.9 × 10 8 ) exp −8.6 × 10 3 T where P n is a polymer containing n monomers, k 1 [L/mol-h] is the forward rate constant for polymerization, and k −1 [L/mol-h] is the reverse reaction.Because the liquid volume V is changing during drying and rehydration, Equations ( 2) and (3) are formulated in terms of total moles, where z i is the total number of polymers of length i and W the number of moles of water in the liquid phase.The time is t [h], and the rate of water evaporation is R v,w [mol/h].Water is also generated via this condensation reaction process.The driving force for evaporation is calculated using Raoult's Law, with K y,w providing the mass transfer coefficient.This coefficient was combined with the saturation pressure P * w and the total pressure P t to yield K P .The air was assumed to be sufficiently dry that its effect on mass transfer could be neglected, and at 85 o C, K P was estimated from data as 0.022 mol/h.The prefactors and activation energies in eqns.( 5) and ( 6) were also estimated from data, to determine the dependence of the reaction kinetics on temperature T [K].
The model structure used in Ref. [35] was adapted from Ref. [156], which also included the effect of water content in the air: where y w and y * w are the mole fractions of water in the air and the amount at saturation conditions.In the present paper this effect is included in the model, assuming that the temperature dependence of K y,w can be neglected over 60-85 o C, and modeling the saturation pressure of water according to Antoine's Law as with A = 8.07131, B = 1730.63,and C = 233.426[157].The units on T here are in o C, with P * w in [mmHg].Using Antoine's equation at 85 o C and K P = 0.0022 mol/h [35], a value of K y,w = 0.0038 mol/h is obtained.

Environmental Scenarios
Three distinct dynamic environments are considered here: 1.The case of sudden temperature switches as implemented in Ref [35].At a high temperature of 85 o C, the air is assumed to be completely dry, and at the low temperature of 60 o C the system is capped, allowing no transfer of water.The hot period lasts 18 h and the cold period is 6 h.Eight cycles are performed, corresponding to eight days.(Note: the day length on the early Earth was actually closer to 12 h, but that effect should not change these results substantially.) 2. The temperature is varied sinusoidally with a 24 h period over 8 days.The temperature varies between 60 and 85 o C, the same levels as in Case 1.
The system is open to mass transfer at all times, and the water content (mole fraction) in the air is constant, at the saturation level for 60 o C.This water level is motivated by an environmental scenario in which the air becomes saturated at night and dries out during the dry as the temperature heats up. 3. The temperature is cycled as in the previous case, but the system is now closed to the mass transfer of water.The system does contain gas in the head space, and transfer between the liquid and gas phases can occur.However, the total water content is fixed.This case is reiminscent of reaction in the pore in a rock, another possible origins of life scenario.In this case, the amount of polymerization depends strongly on the amount of gas in the headspace.The total pressure therefore increases as the temperature rises.

Results
The system behavior associated with the three scenarios is compared in Figure 7.In all scenarios, the liquid phase is initialized with 5.55 µmol of water and 2.5 × 10 −3 µmol of malic acid monomer.Case 1 is identical to the most successful case studied in Ref. [35].At the beginning of each cold phase, 5.55 µmol of water is added to rehydrate the system, corresponding to rain, dew, or an incoming tide.After eight days, 60% of the monomer has been incorporated into oligomer.During the cold segments the water content is high and free monomer is released, as shown in Figure 7b,c.Nevertheless, the monomer concentration continues to ratchet downward during the first few cycles, and then stabilizes into cyclic steady state.Because the system is capped during the cold period, the water content in the gas phase (Figure 7d) is not modeled for this case.
In Case 2, no water is directly added during the process, but water is still generated from the reaction and can transfer between the gas and liquid phases.The monomer conversion is about half that of Case 1, as seen in Figure 7b.However, it is still possible to form a significant number of ester bonds, despite the presence of water in the liquid phase at higher temperatures.With a different sinusoidal temperature profile, one could match the same mean temperature and deviation as in Case 1, and possibly achieve enhanced conversion.(However, the model was not validated for a wider temperature range, so this case is not presented.)As shown in Figures 7c, the amount of water in the liquid phase is small relative to Case 1, indicating that the water that is present initially and that is generated by the reaction can be removed efficiently through mass transfer.Two different versions of Case 3 are shown in Figure 7.In the first, the amount of gas (in moles) is equal to the amount of water initially present.In the second version, the amount of gas is one hundred times that amount.As shown in Figure 7, virtually no ester bond formation is achieved when the headspace is small, due to the high water content in the liquid at elevated temperature.However, a conversion greater than Case 1 is achieved for the larger headspace.As the gas phase water content is lowered in Case 2, it approaches the behavior of Case 3 for the larger headspace, since their temperature profiles are the same.
These results demonstrate that the reaction is very sensitive to the details of the environment, and that according to the model significant ester polymerization can be achieved under these more realistic scenarios.However, many simplifying assumptions made in the modeling should be considered more closely, such as ideal gas and ideal solution behavior.The reaction kinetics in the model are not dependent on viscosity or chain length, and the mass transfer is limited by the liquid side only.From these simplified models, it is clear that phase equilibria and mass transfer are critical effects to be included in origins of life kinetics modeling, and can be used to design new experiments and systems.

Future Outlook
The lessons learned from origins of life research can be applied to modern engineering problems as well.Selection has recently been applied to an exhaustive tripeptide library to identify sequences that assemble to form a hydrogel, with application to medicine and consumer products [158].Because the relationship between peptide sequence and assembly structure is not well understood, selection can be a valuable approach for materials discovery, and can help to elucidate design principles in peptide assembly.
More broadly, Lehn pioneered the concept of dynamic combinatorial chemistry [139].By moving beyond the chemistry of covalent bonding, supramolecular chemistry with reversible interactions can be used to create a more life-like library of diverse molecular assemblies, from simpler and continuously exchanging components.An assembly that binds to a particular target can then be selectively removed from the library, while the remaining library components continuously re-equilibrate.As a result, the best binder is continually produced by the library, at least until one of its constituents is completely depleted.The original motivation for this idea was catalysis [139] but template-triggered amplification also provides a new pathway for drug design and synthesis [159].
Lehn specifically articulated the connection between dynamic combinatorial chemistry and evolution [160]-as an chemical embodiment of selection on a diverse pool of molecules, and responding to variations in external factors such as temperature or pH.Otto proposed self-replicating systems, in simulations and also based on modified peptides and disulfide exchange.The term "systems chemistry" has been coined by chemists to describe and embrace the complexity associated with dynamic combinatorial chemistry [161,162].Engineering approaches can potentially aid in managing such complexity [163].
Ultimately, by understanding the chemical origins of life on Earth, we can better understand the design principles associated with biological materials and biological systems.It does not appear that the chemical origins of life can be understood from a reductionist perspective, but rather that a complex reaction network would have been present.Tools from chemical engineering may be ideally suited to tackling these grand challenge research questions.

Figure 1 .
Figure 1.The components of proteins-amino acid monomers polymerize and then assemble.The assembled structures enable catalytic function.

Figure 2 .
Figure 2. The components of ribonucleic acid (RNA).The nucleotide monomer is comprised of a nucleobase (A, G, C, U), the sugar ribose, and a phosphate group.The nucleotides polymerize, thereby storing information through the sequence of bases.Polymers strands then base pair with their complementary bases via hydrogen bonding.In DNA, uracil is replaced with thymine.Beyond hydrogen bonding, DNA further assembles into a double helix.

Figure 3 .
Figure 3. Progression from monomers to polymers to assemblies.

.
Stanley Miller' 's cyanamide experiment …… went unreported for over 50 years, but was recently explored to study cyanamidemediated biomolecule polymerization under early Earth conditions.In their Communication on page 8132 ff., F. M. Fernµndez, J. L. Bada et al. show that the dimerization of cyanamide in the presence of amino acids and intermediates in the Strecker synthesis of amino acids yields significant levels of dipeptides, which provides evidence that cyanamide enhances polymerization under simulated prebiotic environments.(Background image provided byRon Miller.)

Figure 6 .
Figure 6.Accumulation of long RNA oligomers in a pore, driven by a thermal gradient.From Ref. [41].

Figure 7 .
Figure 7. Monomer comsumption versus time for the three scenarios.(a) temperature versus time; (b) free monomer versus time; (c) water in liquid phase versus time; (d) water in gas phase versus time.For Case 3, results with small (5.55 µmol) and large (555 µmol) headspace are shown.