Non-Standard Genetic Codes Define New Concepts for Protein Engineering

The essential feature of the genetic code is the strict one-to-one correspondence between codons and amino acids. The canonical code consists of three stop codons and 61 sense codons that encode 20% of the amino acid repertoire observed in nature. It was originally designated as immutable and universal due to its conservation in most organisms, but sequencing of genes from the human mitochondrial genomes revealed deviations in codon assignments. Since then, alternative codes have been reported in both nuclear and mitochondrial genomes and genetic code engineering has become an important research field. Here, we review the most recent concepts arising from the study of natural non-standard genetic codes with special emphasis on codon re-assignment strategies that are relevant to engineering genetic code in the laboratory. Recent tools for synthetic biology and current attempts to engineer new codes for incorporation of non-standard amino acids are also reviewed in this article.


Introduction
The genetic code maps 64 codons onto a set of 20 amino acids plus the translational stop signal [1]. These codon-to-amino acid assignments are established by 20 aminoacyl-tRNA synthetases (AARSs) that recognize, activate and charge 20 proteinaceous amino acids onto tRNAs. Aminoacyl tRNAs are then transferred to the ribosome where their three letter anticodons read the three letter codons of messenger RNAs (mRNA) [2]. Although the genetic code is almost universal, 34 alterations in nuclear and organellar genomes (Table 1) from bacterial to eukaryotic species have been discovered [3]. The majority of these codon reassignments involve sense to nonsense codon changes (or vice versa) and occur in mitochondria. Only one nuclear sense-to-sense alteration is known so far, namely the reassignment of the CUG codon from leucine to serine in several fungal species of the CTG clade [4,5]. Among code variants involving stop codons are glutamine and cysteine codons of certain ciliates [6] and the tryptophan codon of Mycoplasma [7]. Some of these reassignments involve codons whose identities change multiple times in closely related phylogenetic lineages suggesting that certain taxonomic groups (e.g., the ciliates) are more prone to codon reassignment than others [8]. Additionally, two non-canonical amino acids are naturally incorporated into the genetic code, namely selenocysteine, which is inserted at specific UGA sites in a wide range of prokaryotes and eukaryotes [9,10] and pyrrolysine in the archeon Methanosarcina barkeri at selected UAG sites [11,12].
These alterations provide insight into the evolution of the genetic code and highlight new concepts that can be used to manipulate protein function for basic and applied research purposes. In Life 2015, 5,[1610][1611][1612][1613][1614][1615][1616][1617][1618][1619][1620][1621][1622][1623][1624][1625][1626][1627][1628] recent years, non-canonical amino acids have been incorporated into proteins in vivo using orthogonal aminoacyl-tRNA synthetase/tRNA pairs and nonsense codons. More than 100 unnatural amino acids have been incorporated into proteins of numerous organisms, such as Escherichia coli [13][14][15], Saccharomyces cerevisiae [13], mammalian cells [13,14], Shigella [15], Salmonella [15], Mycobacterium tuberculosis [16], Drosophila melanogaster [13], Caenorhabditis elegans [13,17], Bombyx mori [18] and Arabidopsis thaliana [14]. High level misincorporation of canonical amino acids has also been reported. UAG stop codons have been reassigned to glutamine (Gln) and tyrosine (Tyr) in a modified E. coli strain lacking both UAGs in essential genes and the release factor-1 (RF1) which recognizes UAGs [19]. Sense codons have been reassigned to semi-conserved amino acids in E. coli through selective pressure incorporation (SPI) methodologies that activate amino acid misincorporation in quiescent cells to minimize the toxic effects of codon ambiguity [20,21]. Moreover, Euplotes crassus tolerates the incorporation of two amino acids (selenocysteine and cysteine) at the UGA codon and the dual use of this codon can occur within the same gene [22]. These examples highlight high genetic code flexibility, but how natural variation in codon-amino acid assignments emerges and is selected as well as the consequences of engineering the genetic code remain unclear. Table 1. Genetic code alterations in mitochondrial and nuclear genomes. These changes are phylogenetically independent and some of them occur more than once (adapted from [3]

Nuclear Genetic Code Variation
Most codon reassignments have been linked to alterations in components of the translational machinery, namely tRNAs, aminoacyl-tRNA synthetases and the release factors that recognize stop codons [23].
In bacteria, reassignments appear to be restricted to the UGA stop codon and are associated with disappearance of RF2, which recognizes the UGA and UAA termination codons and mutant tRNAs that misread these codons. UGA has been reassigned to Trp in Mycoplasma spp. [7] and Spiroplasma citri [24]. Recent metagenomics studies and single-cell sequencing approaches revealed that the uncultivated bacteria Candidatus Hodgkinia cicadicola [25] and BD1-5 [26] also decode UGA as Trp, while SR1 bacteria [27] and Gracilibacteria [28] decode it as Gly.
Mollicutes with altered codes have two Trp-tRNA species, one with the canonical CCA anticodon to decode the UGG-Trp codon and the other with the UCA anticodon for decoding the UGA stop codon [29]. Since only UAA and UAG codons are used as termination codons, these species maintained the RF1 (responsible for the recognition of UAA and UAG) and eliminated RF2 [30]. Their small and AT-rich genomes (e.g., Mycoplasma capricolum AT content is~75%) is likely to introduce important codon usage biases that may force the replacement of UGA for UAA codons. This renders RF2 dispensable as RF1 alone is able to recognize the remaining UAA and UAG termination codons [31]. with the entire domain 1 of Tetrahymena result in UGA-only specificity in vitro [53], but it retains the ability to recognize all three codons in vivo [54]. Introduction of Tetrahymena TASNIKS and YCF motifs in human eRF1 does not alter recognition of UAA and UGA codons, but dramatically increases readthrough at UAG codons [50]. It has been suggested that Tetrahymena represents an ambiguous intermediate stage of the codon reassignment process as eRF1 retains the ability to recognize all three stop codons and reassignment is accomplished by competition from its suppressor Gln-tRNAs [54] that efficiently decode UAR codons as Gln [6]. Conversely, Blepharisma and Euplotes reassigned UGA stop codons to Cys and only UAR codons are recognized by their eRF1 as termination codons [55]. Both have a single substitution from Leu-126 to Ile in the YCF motif-YICDNKF. Introduction of this mutation in S. cerevisiae eRF1 dramatically increased the readthrough at UGA sites [50]. Another consistent substitution found in both genera is Ser-70 to Ala, which has been shown to increase UGA readthrough in vivo, while maintaining efficient termination at UAR codons. For the efficient discrimination of guanine in the second codon position, Ser-70 must be able to form a hydrogen bond with Ser-33 (GTS loop), whose interaction is lost upon substitution with alanine [56].
The only known sense-to-sense reassignment in nuclear genomes is found in several Candida species [5] where the CUG codon is reassigned from Leu to Ser, although its decoding in vivo still involves some degree of ambiguity [57][58][59]. This code alteration is mediated by a Ser-tRNA CAG (Figure 2A,B) that is recognized by both SerRS and LeuRS [60,61]. It has the leucylation identity elements A 35 and m 1 G 37 and a U-to-G 33 mutation which distorts the anticodon U-turn and lowers its leucylation and decoding efficiencies. The discriminator base is G 73 which is a major identity element for serylation along with 3 GC pairs in the variable arm [60,61].

Mitochondrial Variations
Mitochondria show a significant diversity of codon identity reassignments, comprising nonsense-to-sense, sense-to-sense, sense-to-nonsense and sense-to-unassigned codon changes [62]. Alterations appear to be facilitated due to their reduced genome size and complexity, which encodes only a small set of essential genes. Also, their genomes tend to be strongly biased as they are AT-rich [62]. They encode only a small set of tRNAs (for example, human mtDNAs encode 22 tRNA species [63] and thus each tRNA can read two to four codons in a four codon-box by expanded Life 2015, 5, 1610-1628 wobbling ( Figure 1). For example, the presence of an unmodified U at anticodon position 34 (wobble) enables pairing with N-ending codons, allowing for decoding four codons in codon-boxes. Also, several modified nucleosides in the first and second position of the anticodon play critical roles in mitochondrial decoding [64]. Termination codons have been reassigned to different amino acids in mitochondria. The UAA codon is decoded as Tyr in the mitochondria of the nematode R. similis [65]. UAG codons are decoded as tyrosine by an unusual Tyr-tRNACUA in calcareous sponges [66], but in green algae its meaning has changed to Ala or Leu [67]. The most frequent reassignment involves decoding of the UGA stop as Trp [68]. This change is mediated by a Trp-tRNA with the anticodon UCA, where its wobble position carries a modified uridine. Modifications can be 5-carboxymethylaminomethyluridine (cmnm 5 U), 5-carboxymethylaminomethyl(2-thio)uridine (cmnm 5 s 2 U) or 5-taurinomethyluridine (τm 5 U) and they expand the decoding capacity to R-ending codons, enabling the decoding of UGG and UGA codons as Trp [69].
Sense codons also change identity in mitochondria and some are unassigned as they are not present in the mtDNA. Insertion of Met at Ile AUA codon is frequent in most metazoans. In mammalians, this identity change is mediated by a Met-tRNACAU with a modified C in the wobble position to 5-formylcytidine (f 5 C) [70], which enables decoding of both AUG and AUA codons [71]. Ascidian Met-tRNA has a τm 5 U modification in the same position [69]. The AAA-Lys codon is translated as asparagine in echinoderms and platyhelminths [72]. In starfish mitochondria, a single Asn-tRNAGΨU with a modification to pseudouridine (Ψ) in the second position of the anticodon decodes the canonical AAY-Asn codons and the AAA-Lys codon. Also, its Lys-tRNA has a CUU anticodon, instead of GUU, which restricts its decoding to AAG only [73].
Mitochondria of the yeast species Saccharomyces, Nakaseomyces and Vanderwaltozyma decode the four Leu-CUN codons as threonine [74]. This alteration is associated with the loss of the Leu-tRNAUAG capable of decoding the CUN codons and the appearance of a mutant Thr-tRNAUAG with an unmodified U at the wobble position which enables recognition of all four nucleotides at the third codon position [64]. Interestingly, this Thr-tRNA has evolved from a His-tRNAGUG due to loss of its typical guanosine at position -1 and substitution of the discriminator base C73 to A73 (critical identity elements for the HisRS) [75], and by addition of an adenosine at position 35. Consequently, its anticodon loop has 8-nt and is a substrate for the yeast ThrRS [76]. On the other hand, the yeast Ashbya gossypii decodes the CUU and CUA codons as Ala using an Ala-tRNAUAG [77]. It was proposed that this tRNA evolved from the later Thr-tRNAUAG through reduction of the anticodon loop (major identity element to S. cerevisiae ThrRS [78]) and introduction of a G3:U70 base pair which is a major identity element for the AlaRS [75].
Arginine AGA and AGG codons change identity very often and have different meanings, namely Ser [79], Gly [80] or stop [63]. Mitochondria that reassigned AGR codons lack the Arg-tRNAUCU gene, which has been proposed as the initial step for these reassignments [68]. In the absence of the competitor Arg-tRNAUCU, the AGA codon is captured by a Ser-tRNAGCU [81]. In Termination codons have been reassigned to different amino acids in mitochondria. The UAA codon is decoded as Tyr in the mitochondria of the nematode R. similis [65]. UAG codons are decoded as tyrosine by an unusual Tyr-tRNA CUA in calcareous sponges [66], but in green algae its meaning has changed to Ala or Leu [67]. The most frequent reassignment involves decoding of the UGA stop as Trp [68]. This change is mediated by a Trp-tRNA with the anticodon UCA, where its wobble position carries a modified uridine. Modifications can be 5-carboxymethylaminomethyluridine (cmnm 5 U), 5-carboxymethylaminomethyl(2-thio)uridine (cmnm 5 s 2 U) or 5-taurinomethyluridine (τm 5 U) and they expand the decoding capacity to R-ending codons, enabling the decoding of UGG and UGA codons as Trp [69].
Sense codons also change identity in mitochondria and some are unassigned as they are not present in the mtDNA. Insertion of Met at Ile AUA codon is frequent in most metazoans. In mammalians, this identity change is mediated by a Met-tRNA CAU with a modified C in the wobble position to 5-formylcytidine (f 5 C) [70], which enables decoding of both AUG and AUA codons [71]. Ascidian Met-tRNA has a τm 5 U modification in the same position [69]. The AAA-Lys codon is translated as asparagine in echinoderms and platyhelminths [72]. In starfish mitochondria, a single Asn-tRNA GΨU with a modification to pseudouridine (Ψ) in the second position of the anticodon decodes the canonical AAY-Asn codons and the AAA-Lys codon. Also, its Lys-tRNA has a CUU anticodon, instead of GUU, which restricts its decoding to AAG only [73].
Mitochondria of the yeast species Saccharomyces, Nakaseomyces and Vanderwaltozyma decode the four Leu-CUN codons as threonine [74]. This alteration is associated with the loss of the Leu-tRNA UAG capable of decoding the CUN codons and the appearance of a mutant Thr-tRNA UAG with an unmodified U at the wobble position which enables recognition of all four nucleotides at the third codon position [64]. Interestingly, this Thr-tRNA has evolved from a His-tRNA GUG due to loss of its typical guanosine at position -1 and substitution of the discriminator base C73 to A73 (critical identity elements for the HisRS) [75], and by addition of an adenosine at position 35. Consequently, its anticodon loop has 8-nt and is a substrate for the yeast ThrRS [76]. On the other hand, the yeast Ashbya gossypii decodes the CUU and CUA codons as Ala using an Ala-tRNA UAG [77]. It was proposed that this tRNA evolved from the later Thr-tRNA UAG through reduction of the anticodon loop (major identity element to S. cerevisiae ThrRS [78]) and introduction of a G3:U70 base pair which is a major identity element for the AlaRS [75].
Life 2015, 5,[1610][1611][1612][1613][1614][1615][1616][1617][1618][1619][1620][1621][1622][1623][1624][1625][1626][1627][1628] Arginine AGA and AGG codons change identity very often and have different meanings, namely Ser [79], Gly [80] or stop [63]. Mitochondria that reassigned AGR codons lack the Arg-tRNA UCU gene, which has been proposed as the initial step for these reassignments [68]. In the absence of the competitor Arg-tRNA UCU , the AGA codon is captured by a Ser-tRNA GCU [81]. In Drosophila, AGG codons are absent and only AGA codons are decoded by the Ser-tRNA GCU which has an unmodified G at the wobble position [82]. In squid and starfish mitochondria, the wobble position of Ser-tRNA GCU is methylated to m 7 G 34 which expands its capacity to read AGR-Arg codons, inserting serine at these sites [83]. On the other hand, the wobble position of Ser-tRNA of Ascaris mitochondria is occupied by an unmodified U [84], which allows decoding of AGN codons as Ser [81]. In ascidian mitochondria, AGR codons are decoded as Gly by a Gly-tRNA UCU with a modification in the wobble position to τm 5 U [69]. Although the majority of changes are associated to the codon pair simultaneously, some arthropods and also the nematode R. compacta decode the AGG codon as Lys and AGA as Ser. These species have an unmodified Ser-tRNA GCU for AGA codons and a Lys-tRNA with a CUU anticodon instead of the typical UUU anticodon, which is thought to recognize the AGG codons at low efficiency [85]. Interestingly, the appearance of this atypical Lys-tRNA CUU restricts recognition of AAA-Lys codons, which has been correlated with its reassignment to Asn by Asn-tRNA GUU , in this case and in other species that do not use the AGG codon as Lys (e.g., in echinoderms) [73].
Another codon that is reassigned to stop is the UCA-Ser codon of the green alga Scenedesmus obliquus [86]. Both have in common the absence of the cognate tRNA that would recognize AGR or UCA codons, namely Arg-tRNA UCU [68] and Ser-tRNA UGA , respectively. Since Ser-tRNA UGA is responsible for decoding the UCN-Leu codon-box, S. obliquus has a Ser-tRNA GGA to decode the other UCU and UCC codons, and UCG is an unassigned codon [86]. Termination codons have also been reassigned in mitochondria. The reassignment of the UGA codon to Trp happens in all animal mitochondria [64]. These reassignments require changes in the release factors, but the termination mechanism in mitochondria remains an unsolved question. Four different homologues to bacterial release factors have been found in human mitochondrial systems: mtRF1, mtRF1a, ICT1 and C12orf65 [87]. To date, none of these factors have shown specific UGA release activity. Although molecular dynamics simulations have proposed that mtRF1 may behave like RF1 [88] or that it may rescue stalled ribosomes with empty A-sites [89], its function remains elusive since no in vitro release activity has been found for any termination codon, including AGR codons [90]. mtRF1a has in vitro and in vivo release activity in response to UAG and UAA stop codons, similarly to bacterial RF1 [91]. ICT1 is an integral member of the mitoribosome with codon-independent peptidyl-tRNA hydrolase activity [87], and is supposed to function as a multipurpose rescue factor for stalled ribosomes [90]. Regarding the use of AGR codons as termination codons in vertebrate mitochondria, one must consider the absence of the Arg-tRNA UCU that decodes AGR codons [68]. Since it is expected that the ribosome stalls at these sites, ICT1 recognizes it and terminates translation at AGR sites [90].

Natural Expansion of the Genetic Code to 22 Amino Acids
Termination codons are also the target for the incorporation of the non-canonical amino acids selenocysteine (Sec), in a wide range of prokaryotes and eukaryotes [92], and pyrrolysine (Pyl) in archaeal Methanosarcina species [93], producing novel classes of proteins.
Incorporation of Sec in response to an in-frame UGA codon is achieved by complex recoding machinery that informs the ribosome not to stop at this position. The mechanism is distinct in prokaryotic and eukaryotic organisms, but there are some similarities. Both have a special Sec tRNA, which is a minor isoacceptor derived from a serine tRNA ( Figure 2C). The other key players are SelB and SECIS (selenocysteine insertion sequence). Since Sec has its own tRNA Sec , biosynthesis begins with SerRS acylating tRNA Sec with serine, producing Ser-tRNA Sec . Then, different enzymes convert Ser-tRNA Sec into Sec-tRNA Sec : selenocysteine synthase (SelA) and selenophosphate synthetase (SelD) in bacteria and O-phosphoseryl-tRNA kinase (PSTK) and Sep-tRNA:Sec-tRNA synthase (SepSecS) in Life 2015, 5,[1610][1611][1612][1613][1614][1615][1616][1617][1618][1619][1620][1621][1622][1623][1624][1625][1626][1627][1628] archaea and eukarya [10,94]. Once the Sec-tRNA Sec is available, recoding of UGA as Sec requires the presence of the translation elongation factor SelB. This factor binds to Sec-tRNA Sec and forms the SelB.GTP.Sec-tRNA Sec complex that is delivered to the ribosome. Studies performed by Bock and co-workers revealed that SelB must be complexed with the SECIS element for the correct interaction with the ribosome to occur [92]. Binding of the ternary complex to the SECIS structure induces a conformational change in SelB that enables codon-anticodon interaction between the Sec-tRNA Sec and the UGA codon at the ribosomal A-site. Therefore, the SECIS element has a critical double function. It converts SelB into a "competent state" that gives SelB a strong competitive advantage relative to the release factor for decoding UGA. Simultaneously, it prevents normal UGA termination codons from being decoded as Sec by the SELB.GTP.Sec-tRNA Sec ternary complex. The dual properties of SelB and SECIS ensure that only UGA codons in selenoprotein mRNAs are recoded [9].
Life 2015, 5, page-page 7 and the UGA codon at the ribosomal A-site. Therefore, the SECIS element has a critical double function. It converts SelB into a "competent state" that gives SelB a strong competitive advantage relative to the release factor for decoding UGA. Simultaneously, it prevents normal UGA termination codons from being decoded as Sec by the SELB.GTP.Sec-tRNA Sec ternary complex. The dual properties of SelB and SECIS ensure that only UGA codons in selenoprotein mRNAs are recoded [9].  Most tRNAs have 7 bp in the acceptor stem and 5 in the TΨC arm, while eukaryal and archaeal tRNAs Sec exhibit a 9 bp in the acceptor stem and 4 in the TΨC arm. Eukaryotic and archaeal tRNA Sec species have 6 or 7 bp D-stems, respectively. Molecular modeling suggested that a 7 bp D-stem in archaeal tRNA Sec would compensate for the short 4 bp T-stem thus allowing for the normal interaction between the D-and T-loops; (D) tRNAs Pyl has a smaller D-loop (4-5 bp). Only one base is found between the acceptor and D-stems, rather than two bases, and the almost universally conserved G-purine sequence in the D-loop and TΨC sequence in the T loop are lacking. The anticodon stem forms with six, rather than five, base pairs, leaving only a very short (three base only) variable loop (adapted from [3]).
Several mechanisms for Sec and Pyl insertion in protein sequences are present in different organisms, but context dependency is the universal feature of these occurrences and they can be regarded as preprogrammed modifications of canonical decoding rules.

Genetic Code Expansion for Co-Translational Protein Engineering
The study of structural and molecular features of non-standard genetic codes, in addition to support models for codon reassignment theories (reviewed in [96,97]), also provides useful information for synthetic rewriting of genetic codes.
Incorporation of non-canonical amino acids (ncAAs), in particular, the isostructural ncAAs which are recognized by the endogenous host cell machinery, has been possible by replacement of canonical amino acids (cAAs) using a supplementation-based incorporation method (SPI). This approach uses auxotrophic strains for one of the common 20 canonical amino acids (cAAs) to replace a specific cAA with a ncAA. The method exploits the natural tolerance of the host AARSs to the isostructural ncAAs, which allows the concurrent exchange of many residues in a target protein by sense-codon reassignment [98]. Although the overall replacement of a cAA by a ncAA cannot be tolerated during exponential growth, non-dividing cells are viable and are able to overexpress proteins that contain the ncAA. The diversity of amino acid analogs that can be incorporated using this approach has been increased through AARS overexpression, active-site engineering and editing domain mutations [99]. Numerous examples of applications of this technique are available, including the replacement of methionine with selenomethionine to introduce a heavy atom into proteins for crystallographic phasing experiments [100] and, in other cases, methionine or phenylalanine have been replaced by alkyne-containing ncAA analogs to track newly synthesized proteins [101].
As for orthogonal ncAAs (that do not participate in conventional translation), they have been added by site-specific incorporation in response to stop or quadruplet codons (stop codon suppression, SCS) using orthogonal aminoacyl-tRNA synthetase:tRNA pairs (Figure 3) [102]. Orthogonal tRNAs and AARSs are constructed by following a series of conditions that contribute to the lack of cross-reactivity between the pair and the endogenous host synthetases, amino acids and tRNAs. Firstly, the tRNA cannot be recognized by the endogenous AARSs of the host, but must function efficiently in translation. Another crucial requirement for the tRNA is that it must deliver the ncAA in response to a unique codon that does not encode any of the 20 cAA (for example, a stop codon). Secondly, the orthogonal AARS must aminoacylate only the orthogonal tRNA and none of the endogenous tRNAs. This synthetase must also aminoacylate the tRNA with only the desired unnatural amino acid and no endogenous amino acid. Similarly, the ncAA cannot be a substrate for the endogenous synthetases. Finally, the ncAA must be efficiently transported into the cytoplasm when added to the growth medium, or biosynthesized by the host [103]. A number of heterologous AARS/tRNA pairs have been developed to expand the genetic code of E. coli, yeast and mammalian cells. For example, the E. coli GluRS/human initiator tRNA, the E. coli TyrRS/E. coli tRNATyr, the E. coli LeuRS/E. coli tRNALeu, and the M. mazei PylRS/M. mazei tRNAPyl pairs are all orthogonal in S. cerevisiae [102], demonstrating the potential of this methodology for synthetic biology. LeuRS/E. coli tRNALeu, and the M. mazei PylRS/M. mazei tRNAPyl pairs are all orthogonal in S. cerevisiae [102], demonstrating the potential of this methodology for synthetic biology.

Reassignment of Stop Codons
Stop codon suppression is the most frequently used method to incorporate ncAA into proteins in vivo. This approach comprises the use of an orthogonal aminoacyl-tRNA synthetase/tRNA pair, specifically developed to introduce ncAAs at the stop codon, and deletion of the corresponding release factor to increase suppression efficiency. One of the first successful reassignments was performed by Mukai and colleagues that reassigned the UAG (amber) codon to the ncAA iodotyrosine (3-iodo-L-Tyr) [19]. They started by mutagenizing the UAG stop codon to UAA in seven essential genes of E. coli, which allowed the deletion of the RF1-encoding prfA gene (release factor 1 terminates gene translation at UAA and UAG). Next, cells were supplied with an amber suppressor archaebacterial TyrRS/tRNACUA pair that inserted 3-iodo-L-Tyr when it encountered UAG, as demonstrated by the full-length expression of a target protein containing six copies of the UAG codon [19,105,106].
Recently, several groups applied a genome wide editing approach where the replacement of the amber stop codon occurs not only in essential genes but in all instances [34,107,108]. For example, Lajoie et al. used both multiplex automated genome engineering (MAGE) [109] and conjugative assembly genome engineering (CAGE) [107] to replace all known UAG stop codons in E. coli MG1655 with synonymous UAA codons. This allowed the deletion of RF1 and, therefore, elimination of termination at UAG codons. The resulting organism allowed them to reintroduce amber codons, along with an orthogonal translation machinery (episomal pEVOL) to permit

Reassignment of Stop Codons
Stop codon suppression is the most frequently used method to incorporate ncAA into proteins in vivo. This approach comprises the use of an orthogonal aminoacyl-tRNA synthetase/tRNA pair, specifically developed to introduce ncAAs at the stop codon, and deletion of the corresponding release factor to increase suppression efficiency. One of the first successful reassignments was performed by Mukai and colleagues that reassigned the UAG (amber) codon to the ncAA iodotyrosine (3-iodo-L-Tyr) [19]. They started by mutagenizing the UAG stop codon to UAA in seven essential genes of E. coli, which allowed the deletion of the RF1-encoding prfA gene (release factor 1 terminates gene translation at UAA and UAG). Next, cells were supplied with an amber suppressor archaebacterial TyrRS/tRNA CUA pair that inserted 3-iodo-L-Tyr when it encountered UAG, as demonstrated by the full-length expression of a target protein containing six copies of the UAG codon [19,105,106].
Recently, several groups applied a genome wide editing approach where the replacement of the amber stop codon occurs not only in essential genes but in all instances [34,107,108]. For example, Lajoie et al. used both multiplex automated genome engineering (MAGE) [109] and conjugative assembly genome engineering (CAGE) [107] to replace all known UAG stop codons in E. coli MG1655 with synonymous UAA codons. This allowed the deletion of RF1 and, therefore, elimination of termination at UAG codons. The resulting organism allowed them to reintroduce amber codons, along with an orthogonal translation machinery (episomal pEVOL) to permit efficient and site specific incorporation of p-azidophenylalanine (pAzF) and 2-naphthalalanine (NapA) into green fluorescent Life 2015, 5, 1610-1628 protein (GFP). This recoded organism exhibited increased resistance to T7 bacteriophage, suggesting that new genetic codes could facilitate increased viral resistance [34].
Although this approach is widely used nowadays, it is mostly applied in prokaryotic organisms because deletion of RF1 is not viable in yeast or mammalian cells [110]. Another limitation of this method concerns the nonsense mediated mRNA decay (NMD) mechanism that degrades mRNAs with premature stop codons, which significantly decreases protein yield [111].

Reassignment of Sense Codons
Although recent methods for protein engineering rely on the manipulation of the translation apparatus of the host, the simplest method exploits the close structural similarity between ncAA and a natural amino acid. Due to this similarity, the appropriate aminoacyl-tRNA synthetase is not able to distinguish between cAA and ncAA and permits non-specific charging of the ncAA onto tRNA. Consequently, the activated ncAA-tRNA is used in the translation process and the ncAA is incorporated in response to the sense codon encoding the corresponding cAA. The efficiency of this method is improved when competition from the canonical amino acid for the reassigned sense codon is limited. Auxotrophic bacterial hosts starved for the natural amino acid and supplemented with the ncAA are often used. The success of this strategy was first demonstrated by Cohen and Cowie when they took advantage of the relaxed substrate binding pocket of MetRS to completely replace the natural amino acid methionine by its analog selenomethionine in an E. coli methionine auxotroph [112]. Since then, many other sense codons have been reassigned to incorporate ncAAs into proteins via global substitution [99].
Complementary techniques to this approach have also been used, particularly the over-expression of the aminoacyl-tRNA synthetase of interest and attenuation of its hydrolytic editing activity [113]. For example, overexpression of valyl-tRNA synthetase (ValRS) in a valine auxotroph led to incorporation of one of the stereoisomers of 4,4,4-trifluorovaline (2S,3R-Tfv) in response to valine codons, as indicated by mass spectrometry [114]. Also, Yang and Tirrell showed that mutation of the conserved threonine residue to tyrosine (T252Y) in the editing domain of E. coli LeuRS led to the disruption of the editing activity of the LeuRS, which allowed the incorporation of several unsaturated, non-canonical amino acids in response to leucine codons [115].
Another methodology takes advantage of codons that are decoded by wobbling. At the third position of such codons, Us and Cs can be read by G in the anticodon of the corresponding tRNA while As and Gs can be read by a U or pseudouridine. Kwon et al. introduced an orthologous PheRS/tRNA AAA pair from yeast into an E. coli Phe auxotrophic host and put a target gene under a strong inducible promoter. This gene contained the UUC codon at all desired Phe sites, and a UUU wobble codon was inserted at specific sites for 2-naphthylalanine. The yeast PheRS was able to activate 2-naphthylalanine and charged it on the yeast Phe-tRNA AAA , allowing for the production of a recombinant protein with 2-naphthylalanine [116].
Rare codons provide another method to introduce ncAAs into proteins. For example, the rare AGG arginine codon in E. coli has been reassigned to ncAAs using the PylRS/tRNAPyl CCU pair. Since codon usage and tRNA gene content coevolved to match each other, the endogenous Arg-tRNA CCU content is low, which allowed the ncAA-activated orthogonal tRNA CCU to outcompete the former for the AGG codon. Zeng et al. showed that when N-alloc-lysine was used as a PylRS substrate, almost quantitative occupancy of N-alloc-lysine at an AGG codon site was achieved in minimal medium [117]. Recently, Mukai and colleagues demonstrated the in vivo reassignment of the AGG sense codon from arginine to L-homoarginine. A variant of the archaeal pyrrolysyl-tRNA synthetase (PylRS) was engineered in order to recognize L-homoarginine. The expression of this variant with the AGG-reading tRNA Pyl CCU permitted the efficient incorporation into proteins of the arginine analog. Subsequently, all AGG codons in essential genes were eliminated and the bacterial ability to translate AGG into arginine was restricted in a temperature-dependent manner [118].

Quadruplet Codons
Another opportunity to expand codons for ncAAs emerged from the discovery of naturally occurring frameshift suppressor tRNAs, namely UAGN suppressors (N being A, G, C, or T) derived from Su7-encoding glutamine, ACCN suppressors derived from sufJ-encoding threonine and CAAA suppressors derived from tRNA Lys and tRNA Gln [119]. In these cases, four bases specify an amino acid in response to a mutant tRNA with an extra nucleotide in its anticodon loop (eight nucleotides instead of the standard seven), which leads to a reading frame shift and synthesis of a full length protein. Following this rationale, an orthogonal four-base suppressor tRNA/synthetase pair was generated from Pyrococcus horikoshii tRNA Lys sequences. The mutant suppressor pair permitted the incorporation of L-homoglutamine into proteins in E. coli in response to the quadruplet codon AGGA [119].
Frequently, quadruplets target a rare codon to avoid competition of the native tRNA for the first three bases, which decreases the yield of the target protein with the ncAA. Since the endogenous tRNA is readily accepted by the native ribosome, several groups developed "orthogonal" ribosomes [120,121] that only recognize altered ribosome-binding sites (RBS). The presence of these mutant RBSs assures that only mRNAs containing those sequences are translated by the orthogonal ribosomes with reduced premature termination (ribo-X). This methodology generated orthogonal ribosomes with increased amber suppression on the desired mRNA, while native ribosomes sustained the standard level of amber suppression. Ribo-X were then evolved to increase the efficiency of translation of quadruplet codons (ribo-Q). Recently, a protein containing an azide and an alkyne was produced efficiently using this approach, which allowed the establishment of an internal cross-link [122]. The expectation is that ribo-Q might enable more ambitious alterations to proteins in the near future.

Conclusions and Perspectives
Genetic code alterations may be much more frequent than previously expected, as indicated by the diverse range of alterations found to date (Table 1) [3,123]. Low codon usage, codon unassignment, genome GC pressure, genome minimization, small proteome size and tRNA disappearance are essential players for the evolution of the genetic code [96,[124][125][126]. The Codon Capture theory posits that under biased genome AT or GC pressure, certain codons vanish from the polypeptide coding sequences (ORFeome). These unassigned codons lead to loss of functionality of the corresponding tRNAs, which can be eliminated by natural selection [125]. These erased codons may be reintroduced by genetic drift. Since GC content fluctuates over time, the erased codons can re-emerge, but they may lack cognate tRNAs. Cells that are able to capture these codons and convert them to sense codons have a growth advantage and the codon reassignment can be achieved. The codon capture theory is supported by the disappearance of the CGG codon in Mycoplasma capricolum (25% genome G + C) and the AGA and AUA codons in Micrococcus luteus (75% genome G + C) [127]. On the other hand, there are several other examples of codon reassignments in organisms where strong GC biases do not exist, and even cases of codon reassignments that appear against such bias; for example, reassignment of the leucine CUU and CUA codons to threonine in the AT rich genome of yeast mitochondria [128]. These codon reassignments are better explained by the Ambiguous Intermediate theory [62,124]. This theory postulates that ambiguous codon decoding provides an initial step for gradual codon identity change, and wild-type or mutant misreading tRNAs are the critical elements of codon reassignment. The appearance of mutant tRNAs with altered/expanded decoding properties allows the recognition and translation of non-cognate codons that are incorporated into proteins in competition with cognate ones. Consequently, statistical proteins are produced and, if this ambiguous codon translation is advantageous for the organism, the alternative codon interpretation is selected by natural selection, leading to a new arrangement of the code [124]. This theory is strongly supported by CUG reassignment from leucine to serine in fungi [4,129]. The incidence of genetic code alterations in mitochondria suggests that proteome size imposes strong negative pressure on codon reassignment. This is in line with the Genome Minimization hypothesis that posits that replication speed imposes a strong negative pressure on the mitochondrial genome, leading to selection of small size genomes [126]. This is supported by a study in human mitochondria where only 13 of the 900 proteins of its proteome are encoded by its genome [130]. Since nuclear encoded proteins are synthesized in the cytoplasm using the standard genetic code and are transported into the mitochondria using a signal peptide translocation system, their synthesis escapes the disruption caused by mitochondrial codon reassignments.
The three theories are not exclusive, since the ambiguous intermediate stage can be preceded by a decrease in the content of GC rich codons, so that codon reassignment might be driven by a combination of evolutionary mechanisms [131]. Additionally, the unpredicted existence of AARSs specific for the noncanonical amino acids pyrrolysine and O-phosphoserine [11] raised the possibility that other amino acids with particular functions might exist in still-uncharacterized genomes.
Detailed characterization of natural reassignments was a key step for developing efficient strategies to expand the code for production of proteins with novel biochemical properties. Due to the central importance of engineering proteins for both basic research and biopharmaceutical drug development, there are several established methods to accomplish the incorporation of non-natural amino acids. These can offer selective advantages beyond the evolution of proteins with only the canonical amino acids. One area that benefits from expanded genetic codes is the field of synthetic biology. Synthetic biologists have successfully engineered a wide range of functions into artificial gene circuits, generating switches, oscillators, filters, sensors, and cell-cell communicators with potential applications in medicine, biotechnology, bioremediation, and bioenergy [132]. For example, selective pressure incorporation (SPI) methodologies are currently being used to incorporate non-natural amino acids with reactive functional groups that are critical in site-specific derivatization of proteins for therapeutic purposes. Cho and colleagues reported the recombinant expression of human growth hormone (hGH) containing a site-specifically incorporated para-acetylphenylalanine (pAcF), which served as a chemical handle for conjugation to poly(ethylene glycol) (PEG) [133]. The resulting homogeneously mono-PEGylated hGH showed favorable pharmacodynamics and is being developed clinically [133]. Also, SPI methodologies allowed the purification and identification of 195 newly synthesized proteins in human embryonic kidney (HEK293) cells by orthogonal labeling of non-natural amino acids that were incorporated proteome-wide, following the removal of the corresponding natural amino acid [134].
More recently, Romesberg and colleagues surpassed the dependency on the four natural nucleotides A, T, G, and C [135] by using unnatural base pairs (UBPs) that allowed the incorporation of 152 additional non-canonical amino acids. The future will likely include a host for new applications based on these new technologies.