Biosynthesis, Engineering, and Delivery of Selenoproteins

Selenocysteine (Sec) was discovered as the 21st genetically encoded amino acid. In nature, site-directed incorporation of Sec into proteins requires specialized biosynthesis and recoding machinery that evolved distinctly in bacteria compared to archaea and eukaryotes. Many organisms, including higher plants and most fungi, lack the Sec-decoding trait. We review the discovery of Sec and its role in redox enzymes that are essential to human health and important targets in disease. We highlight recent genetic code expansion efforts to engineer site-directed incorporation of Sec in bacteria and yeast. We also review methods to produce selenoproteins with 21 or more amino acids and approaches to delivering recombinant selenoproteins to mammalian cells as new applications for selenoproteins in synthetic biology.


Discovery of the Genetic Code and the Mechanism of Protein Synthesis
Following the discovery of the structure of deoxyribonucleic acid (DNA), Francis Crick proposed the central dogma of molecular biology that describes how the genetic information in cells flows from DNA and RNA to proteins [1].Crick's sequence hypothesis envisioned that the sequence of nucleic acids corresponds to the amino acid sequence of proteins and is used as a template for amino insertion during protein synthesis [1].Indeed, in all cells, the DNA of a protein-coding gene is converted into ribonucleic acid (RNA) known as messenger RNA (mRNA) through a process called transcription.The resulting mRNAs are used by the ribosome as templates to direct the accurate insertion of each amino acid during protein synthesis [2].The ribosome is a large ribonucleoprotein particle consisting of large, small, and 5S ribosomal RNA (rRNA) subunits as well as many tightly bound ribosomal proteins.The ribosome is an RNA enzyme (ribozyme) that uses its RNA components to catalyze the peptide bond reactions that link each successive amino acid to the next in the growing polypeptide chain [3].Each trinucleotide sequence (codon) in the mRNA is read by the ribosome according to the genetic code table, which was deciphered by Nirenberg [4] and Khorana [5] in 1965.
The genetic code consists of 64 codons, 61 of which correspond to one of the 20 standard proteinogenic amino acids.Three codons (UAG, UGA, and UAA) are termination signals, marking the end of the protein sequence [6].Adapter molecules called transfer RNAs (tRNAs) bring amino acids to the ribosome and determine the amino acid sequence of the polypeptide chain by binding to a specific codon or set of codons in the mRNA with a complementary tRNA anticodon [7] (Figure 1).Aminoacyl-tRNA synthetases (aaRSs) catalyze the ligation of amino acids to their cognate tRNAs through an adenosine triphosphate (ATP)-dependent reaction, producing aminoacyl-tRNA substrates for protein synthesis [8] (Figure 1).The mRNA is translated into proteins at the ribosome, which catalyzes the formation of peptide bonds between successive amino acids (AAs).Amino acids are brought to the ribosome by aminoacylated tRNAs (aa-tRNAs) and the protein sequence is determined by the tRNA anticodon binding to complementary trinucleotide sequences in the mRNA, which are called codons.The aminoacyl-tRNA synthetases (aaRSs) are responsible for recognizing a cognate tRNA and the corresponding amino acid.(A) For example, seryl-tRNA synthetase (SerRS) catalyzes a two-step reaction in which serine (Ser) and adenosine triphosphate (ATP) are used to form a seryl-adenylate intermediate, which is subsequently used to ligate the Ser moiety onto tRNA Ser .SerRS recognizes tRNA Ser , shown with an AGA anticodon that decodes the UCU codon for Ser.To clearly depict the codon:anticodon duplex, the tRNA is shown in the 3 ′ to 5 ′ orientation, and the mRNA is shown in the normal 5 ′ to 3 ′ orientation.(B) Following aminoacylation, aa-tRNAs are each bound by elongation factor thermal unstable (EF-Tu) in bacteria or EF-1α in archaea and eukaryotes.During translation elongation, the complex of elongation factor and aa-tRNA is delivered to the ribosome, where correct pairing between codon and anticodon leads to the incorporation of the amino acid in the growing polypeptide chain.
Accurate translation of the genetic code requires that tRNAs are aminoacylated by a specific aaRS with the correct or cognate amino acid that corresponds to the codon(s) read by that tRNA's anticodon [8].Amino acid substrate selectivity is determined by the structure and chemical environment of the aaRS active site.Some aaRSs possess editing activities to hydrolyze tRNAs that are mis-aminoacylated with a chemically similar but non-cognate amino acid [9].The aaRSs must also discriminate their cognate tRNA from the large pool of structurally similar tRNA molecules through recognition of a key set of identity element nucleotides [10,11].Together, the selectivity of each aaRS for the cognate amino acid and tRNA ensures accurate ligation of the cellular tRNA pool and faithful translation of the genetic code [10].

A Natural Expansion of the Genetic Code with Selenocysteine
The genetic code was originally thought to be universal in all species; however, diverse exceptions to the universal genetic code have emerged [12].While there are examples of codons taking on different meanings (codon reassignment), the standard set of 20 different kinds of proteinogenic amino acids was also viewed as immutable.In the late 1980s, the first exception was discovered in some UGA termination codons that directed the sitespecific and co-translational incorporation of the 21st amino acid, selenocysteine (Sec).The Sec-decoding trait was subsequently found only in a few select proteins in some species representing all three domains of life [13][14][15] (Figure 2).Two partially conserved yet distinct mechanisms for Sec incorporation into proteins evolved in bacteria as compared to archaea and eukaryotes (Figure 2).In both mechanisms, the meaning of selected UGA stop codons is converted from a stop to a sense codon that encodes Sec [15][16][17].In all Sec-decoding organisms, Sec is synthesized from serine (Ser) on its cognate tRNA (tRNA Sec ).First, tRNA Sec is aminoacylated by an endogenous and normal seryl-tRNA synthetase (SerRS) to produce the intermediate, Ser-tRNA Sec (Figure 2) [18,19].The same SerRS also aminoacylates tRNA Ser , and both tRNA Ser and tRNA Sec share structural features, including a large extra variable arm.Another conserved element is the selenophosphate synthetase (SelD or selenophosphate synthetase 2, SPS2) that provides an activated and phosphorylated form of Se, and an essential substrate needed to convert Ser-tRNA Sec to Sec-tRNA Sec [20].Beyond this point, the mechanism in bacteria is divergent compared to eukaryotes and archaea.The bacterial selenocysteine synthase (SelA) directly converts Ser-tRNA Sec to Sec-tRNA Sec [21], while in eukaryotes and archaea the Ser-tRNA Sec is first phosphorylated by O-phosphoseryl-tRNA Sec kinase (PSTK) [22,23], followed by conversion of phosphoseryl-tRNA Sec to Sec-tRNA Sec by Ophosphoseryl-tRNA:selenocysteinyl-tRNA synthase (SepSecS) [24][25][26] (Figure 2).SepSecS and SelA are distantly related members of the pyridoxal phosphate (PLP)-dependent enzyme family [27].
Recoding to Sec relies on an RNA hairpin structure in the mRNA.The selenocysteine insertion sequence (SECIS) occurs downstream of the Sec-encoding UGA codon.In bacteria, SECIS is directly downstream of the UGA codon at a distance of 16-37 nucleotides between UGA and the apical loop of SECIS [28], which is often located within the open reading frame (ORF) of the selenoprotein gene (Figure 2).In archaea and eukaryotes, the SECIS is found more distantly downstream and is located in the 3 ′ untranslated region (UTR) of the selenoprotein genes [29] (Figure 2).Fascinatingly, in some species, UGA codons are not used to encode Sec, and instead, the SECIS element functions to recode other stop codons (UAG and UAA) and even 10 different sense codons to Sec, including the Cys UGU codon in Aeromonas salmonicida.These species also harbour a tRNA Sec with the anticodon complementary to the sense codon that is designated for re-coding to Sec [30].
Sec-tRNA Sec does not interact with the normal translation elongation factors, EFthermal unstable (EF-Tu) in bacteria or EF-1α in archaea and eukaryotes.Instead, Sec incorporation relies on a specialized elongation factor that shares homology with EF-Tu and EF-1α.In bacteria, the SelB elongation factor [31] binds to both Sec-tRNA Sec and the SECIS stem-loop to co-localize the Sec-tRNA with the adjacent UGA codon (Figure 2A).The eukaryotic elongation factor Sec (EFSec) [32], and the archaea SelB (aSelB) [33] fulfill a similar role in binding Sec-tRNA Sec ; however, an additional protein binds the SECIS element.In eukaryotes, SECIS binding protein 2 (SBP2) interacts with both the SECIS and EFSec [34] to localize the distant SECIS element to the target site for recoding [32] (Figure 2B).In archaea, the aSelB protein binds tRNA Sec [33] and positions it at the UGA codon by presumably interacting with an as-yet unidentified SECIS binding protein [35].

Distribution of Selenoproteins
Recoding of UGA to Sec occurs in all three domains of life, Bacteria, Archaea, and Eukarya, but not in all species.Only approximately 20% of bacteria [36][37][38] and 14% of archaea [37,38] contain the Sec-decoding trait and selenoproteins.In eukaryotes, the distribution and number of selenoprotein genes varies greatly.Selenoproteins are found in all vertebrates [39], while several species of insects lack selenoproteins [40].Selenoproteins are absent from most fungi, with only nine species identified as containing selenoproteins [41].The Sec-decoding trait was identified in several species of algae [42][43][44], but not in higher-order plants [44].The Sec-decoding trait is conserved in all animals, including humans.The number of selenoproteins varies greatly among eukaryotic organisms.Mammals and aquatic invertebrates have relatively large selenoproteomes with 15-30 different selenoprotein genes, while other multicellular species, including flies, bees, and worms encode just one or only a few selenoproteins.Phylogenetic analysis suggests that the large selenoproteomes of aquatic invertebrates were lost in many terrestrial species due to the replacement of Sec with Cys [44].

Human Selenoproteins
There are 25 selenoproteins encoded in the human genome [45].While the function of most of these selenoproteins remains unknown, the majority of characterized and functionally defined selenoproteins have oxidoreductase activities [46].The best-characterized selenoprotein families are the thioredoxin reductase (TrxR), the glutathione peroxidase (GPx), and the iodothyronine deiodinase (DIO) protein families, which all have important roles in redox metabolism [47].Two major antioxidant systems in mammals are the Trx [48] and the glutathione (GSH) [49] systems, which protect cells against reactive oxygen species (ROS) and fulfill important cellular signalling functions.The Trx system is driven by three isoforms of the Sec-containing reductase, TrxR (Figure 3).The GSH system relies on the Sec-containing GPx enzymes. .TrxR and Trx can also directly reduce ROS, providing the cell with a powerful ROS defence system.TrxR and Trx also act on a wide variety of other proteins with diverse functions, and the Trx system regulates proteins involved in many cellular functions.TrxR contains a highly nucleophilic selenium (Se) atom in its active site in the form of selenocysteine (Sec).Sec endows TrxR with a catalytically efficient reductase activity that is important in defending the cell against oxidative stress and in redox signalling.

Glutathione Peroxidases
There are eight GPx isoforms in human cells, and five (GPx1, GPx2, GPx3, GPx4, and GPx6) are selenoproteins with Sec in the catalytic active site.The three other isoforms use an active-site Cys in place of Sec [50].GPxs, using GSH as a reductant and act to reduce H 2 O 2 to water or organic hydroperoxides to alcohols [50].GPx1, GPx2, GPx3, and GPx6 function as homotetramers, while GPx4 is a monomer [50].GPx1 is the most abundant isoform and is ubiquitously expressed in both the cytoplasm and mitochondria.GPx1 together with GSH reduces H 2 O 2 and low-molecular-weight hydroperoxides [51].GPx2 is similar to GPx1 yet mainly expressed in the gastrointestinal system.GPx3 is secreted from cells and is found in human plasma as a glycosylated protein.GPx3 can also use Trx as a reductant in addition to the GSH monomer [50].GPx4 is a membrane-associated protein, where it protects cell membranes from oxidative stress, and GPx4 is the only isoform that can reduce complex lipid hydroperoxides.GPx6 is not well characterized.The GPx6 isoform is expressed mainly in embryonic cells and in epithelial cells of the olfactory system where it may regulate the metabolism of odorants [51].

Thioredoxin Reductase, a Critical Selenoprotein in Redox Biology
Mammalian TrxR is a key redox regulator in mammalian cells and is a selenoprotein that is a powerful oxidoreductase containing selenium (Se) in the form of Sec [52].TrxR, along with its major substrate, Trx, compose one of the major disulphide reduction systems in the cell (Figure 3) [53].Sec is an analogue of Cys with Se taking the place of sulfur [54].The Se in Sec produces a stronger nucleophile than Cys, making Sec-containing reductases more efficient electron donors for catalyzing redox reactions due to their lower redox potential [55].Cys to Sec substitutions in Cys-containing reductases can indeed reduce redox potential and increase enzyme activity [55], while Sec to Cys substitutions in TrxR eliminate its activity with some substrates and drastically reduce its activity with others [56,57].Due to its high nucleophilicity, Sec is also the target of clinically relevant TrxR inhibitors, providing a unique mechanism to specifically target the activity of this selenoprotein [58][59][60][61].TrxR inhibitors include the platinum-containing compound, cisplatin, which is a widely used chemotherapeutic drug, and the gold-containing compound, auranofin, a treatment for rheumatoid arthritis [62] and a potential anti-cancer drug [63].TrxR is the main driver of the Trx system (Figure 3), which is involved in oxidative stress responses in eukaryotes, bacteria, and archaea [64].

Thioredoxin System in Cells
In addition to the glutathione system, the Trx system is one of the two main redox regulatory systems in mammalian cells [48] (Figure 3).In mammalian cells, there are three genes encoding three main TrxR isoforms, TrxR1, TrxR2, and TrxR3 [65].TrxR1 localizes to the cytosol, TrxR2 localizes to the mitochondria, and TrxR3 is present primarily in the testes [65].The Trx system consists of several enzymes that work together to transfer electrons from nicotinamide adenine dinucleotide phosphate (NADPH) to a wide range of substrates to maintain the redox balance of the cell, protect against oxidative damage caused by ROS, such as H 2 O 2 , and to control the function of various proteins through redox regulation (Figure 3) [48].
TrxR also acts on several other oxidized proteins besides Trx [68], and Trx also functions to reduce many proteins beyond Prx (Figure 3) [77].In addition to oxidative defence, the Trx system regulates many cellular processes including gene expression, embryonic development, cell proliferation, and apoptosis [78].The Trx system is involved in diverse processes because Trx-dependent redox activities regulate a wide range of protein substrates, including Prx, ribonuclease reductase, phosphatase and tensin homolog, and transcription factors, such as NF-κB, AP-1, and the glucocorticoid receptor [79,80].

Thioredoxin System in Disease
Because the flow of electrons into the Trx system depends on TrxR activity, and because of the diverse roles of the Trx system, TrxR is involved in the development and progression of human diseases [77,[81][82][83].Dysregulation of the Trx system has been observed in many diseases, such as Alzheimer's disease [84], rheumatoid arthritis [85], asthma [86], various forms of cardiovascular disease [87], and other disorders [77,[81][82][83].TrxR1 is involved in several types of cancer [88], including non-small-cell lung carcinoma [89], renal cell carcinoma [90], thyroid cancer [91], breast cancer [91], cervical carcinoma [92], and colorectal cancer [93].Overactive TrxR activity is linked to the chemotherapeutic resistance of some cancer cells by helping to defend tumours against ROS generated by radiation-based chemotherapies [94].The activity of TrxR is also used as a diagnostic maker for the early detection of lung [95] and breast cancers [96].Anti-cancer drugs, such as ethaselen that targets TrxR activity, were developed to combat drug-resistant lung cancers [97].

Diverse and Unknown Functions of Human Selenoproteins
The thyroid hormone is regulated locally at the tissue level by DIOs, which convert the prohormone thyroxine (T4) to its active form (triiodothyronine, T3) by 5 ′ -deiodination, or convert T4 and T3 to inactive forms by 5-deiodination [98].There are three Sec-containing DIOs in humans, which vary in where they are expressed, and which reactions they catalyze [99].DIO2 activates the thyroid hormone by converting T4 to T3 by a 5 ′ -deiodination reaction, while DIO3 deactivates the thyroid hormone by converting T4 and T3 to relatively inactive lesser iodothyronines (reverse triiodothyronine, rT3, and 3,3 ′ -diiodothyronine, T2) by a 5-deiodination reaction [100].DIO1 can catalyze both 5 ′ -and 5-deiodination, activating or deactivating the thyroid hormone, respectively [100].DIO1 is found primarily in the liver, kidney, and thyroid, and in many other tissues at a lower level in adult mammals.DIO2 is found primarily in the uterus, brown adipose tissue, central nervous system, pituitary gland, and placenta, while DIO3 is found primarily in the brain, ovary, placenta, pregnant uterus, testis, and the skin [101].DIOs are involved in a wide range of processes during development and in adults [102] and play a major role in differentiation during development [103].For example, DIO2 and DIO3 are required for the development of the cochlea.DIO3 knockout mice have premature cochlear differentiation [104], while DIO2 knockout mice have retarded cochlear development [105], both of which result in deafness [104,105].
SPS2 is also a Sec-containing selenoprotein in humans [106], and is involved in the production of selenophosphate from selenide, which is required for the conversion of Ser-tRNA Sec to Sec-tRNA Sec [107].Selenoprotein P is a unique selenoprotein.While most mammalian selenoproteins have one SECIS in the 3 ′ -UTR, human selenoprotein P has two SECIS elements, and contains 10 Sec residues [108].Selenoprotein P functions to transport Se from the liver to different cells around the body and to reduce phospholipid hydroperoxides [108].Selenoprotein R, which is also known as methionine-R-sulfoxide reductase B1 (MsrB1), catalyzes the reduction of oxidized methionine residues to repair protein damage due to oxidative stress [46].
The function of other human selenoproteins remains uncharacterized or only partially understood [109].Selenoprotein O occurs in the mitochondria and is proposed to have a kinase function [110], but this has not yet been confirmed experimentally.Selenoproteins T, W, H, and V have a redox motif, suggesting a potential redox function [111].A version of selenoprotein I terminated at the UGA Sec codon, and lacking Sec, functions as an ethanolime phosphotransferase [112], but the function of full-length, Sec-containing selenoprotein I has not yet been determined [53,109].
Selenoprotein K and selenoprotein S are both endoplasmic reticulum (ER) resident proteins and associated with complexes involved in the ER-associated degradation of misfolded proteins, suggesting a possible role in protein homeostasis and quality control [53,113,114].The function of selenoprotein F is still unclear, but it has been implicated in protein folding and secretion in the ER [115].Selenoprotein M is another ER-resident protein that is expressed in the brain and was implicated in calcium release from the ER [109,116], but its specific role is not clear.Selenoprotein N is also found in the ER and includes a calcium-binding domain and a separate domain with an unknown function that contains Sec [109].

Pyrrolysine: Another Natural Expansion of the Genetic Code
In the early 2000s, the UAG termination codon was found to be decoded as an unusual lysine derivative, pyrrolysine (Pyl), in the methanogenic archaeon Methanosarcina barkeri [117].Pyl was found in the active site of three archaeal methyltransferases [118] in several species of the Methanosarcinaceae family and the Pyl residue is crucial for utilizing methylamines as a carbon source [119].Pyl is incorporated into proteins by reassignment, rather than recoding, of the UAG codon.In contrast to Sec, where only selected codons are designated for re-coding to Sec, the Pyl system re-assigns all instances of the UAG codon from stop to Pyl [120].Like Sec, Pyl is also a natural expansion of the genetic code.The Pyl system requires the pyrrolysyl-tRNA synthetase (PylRS) and a cognate tRNA Pyl that includes a CUA anticodon to decode UAG codons.The tRNA Pyl is aminoacylated with free Pyl through an ATP-dependent reaction [121,122].Pyl-decoding organisms also biosynthesize Pyl from two lysine residues with the activity of three genes: pylB, pylC, and pylD [123].

Engineering Genetic Code Expansion
The discovery of exceptions to the genetic code in nature, such as recoding of selected UGA codons to Sec, or reassignment of UAG to Pyl, provided both the inspiration and the molecular machinery required for a new field of biotechnology called genetic code expansion (GCE).GCE is a technique used to produce proteins with the site-specific cotranslational incorporation of additional non-canonical amino acids (ncAAs) beyond the 20 standard amino acids normally used in protein synthesis.Today, a wide variety of ncAAs with diverse functions can be installed by GCE, permitting many applications in a variety of expression hosts from bacteria to yeast to animal models.
The Pyl system has shown unparalleled flexibility and portability in terms of the chemical diversity of ncAAs and in its facile distribution to diverse expression host systems and model organisms [133].Many of the PylRS/tRNA Pyl pairs used for GCE were developed from the naturally occurring M. barkeri or Methanosarcina mazei PylRS/tRNA Pyl systems [134].Expression of the PylRS/tRNA Pyl pair from M. barkeri allowed all instances of UAG to be decoded using Pyl analogues in Escherichia coli [135].Multiple PylRS/tRNA Pyl pairs from several species can decode UAG with Pyl analogues in E. coli, including PylRSs from M. mazei [136], M. bakeri [135], and Methanomethylophilus alvus [137].These naturally occurring PylRS/tRNA Pyl pairs are capable of inserting multiple different Pyl analogues in response to UAG codons in E. coli [138].Several mutant variants were developed following the discovery of PylRS [138][139][140][141][142] to vastly increase the number of ncAAs that can be inserted to more than 200 [143], making the Pyl system the most versatile orthogonal translation system (OTS) known [139].Expression of the M. mazei PylRS/tRNA Pyl pair and its variants in mammalian cells [144,145] and mice [146] allows reassignment of UAG to ncAAs in clinically relevant model systems.

Using the PylRS System to Produce Selenoproteins
Although the PylRS system is the most common OTS used for GCE, it is not commonly used for genetically encoding Sec.One report used a mutant form of the PylRS/tRNA Pyl pair in E. coli to reassign UAG codons to insert a photocaged version of Sec (N-(tert-Butoxycarbonyl)-[(R,S)-1-{4 ′ ,5 ′ -(methylenedioxy)-2 ′ -nitrophenyl}ethyl]-l-selenocysteine) to produce an E. coli peptidyl-prolyl cis-trans isomerase B (PpiB) and a Zika virus NS2B-NS3 protease (ZiPro), both containing the photocaged Sec [147].Following purification, the photocaged Sec was decaged by exposure to ultraviolet (UV) light to produce Seccontaining versions of these proteins [147].Although this system functioned, the bulky photocaged Sec was speculated to interfere with protein folding [147], potentially limiting the use of this system for the general production of selenoproteins.

Using the E. coli Sec Insertion System to Produce Selenoproteins
Most other techniques for translation with Sec rely on using variations of the naturally occurring systems, such as the E. coli Sec insertion system (Figure 2A).The dependence of the system on the presence of SECIS downstream of the recoding site, often in the ORF, to direct Sec insertion by binding SelB presents challenges to using this system for site-specific Sec insertion [148].For some proteins with Sec occurring at the C-terminal, a bacterial SECIS can be inserted into the 3 ′ -UTR to allow recombinant production without changes to the ORF [18,149] (Figure 4A).A mammalian selenoprotein was recombinantly produced by fusing an engineered bacterial SECIS element with the rat thioredoxin reductase (TrxR) gene without causing changes to the protein sequence [150].Recombinant production of human TrxR was also achieved without changing the amino acid sequence with a similar method [149,151] (Figure 4A).
For production in a bacterial host, other Sec-containing proteins with internal Sec residues require a SECIS present within the ORF.In one case, an artificial SECIS was engineered to minimize changes to the protein sequence encoded by the SECIS [18].A Sec-containing glutathione S-transferase was produced recombinantly in E. coli by inserting a minimal bacterial SECIS from the E. coli fdhF gene downstream of the Sec codon, which resulted in six point mutations that had no effect on enzyme activity or substrate binding [152].In another case, a Sec-containing human MsrB1 was produced by mutating the ORF downstream of the UGA Sec codon to generate a SECIS functional in E. coli [153].This engineered SECIS mutated four residues in MsrB1, but these mutations also had little to no effect on enzyme activity [153].

SECIS-Independent Incorporation of Sec in Bacteria
Several engineering approaches overcame the need for SECIS in translation with Sec [154][155][156][157].One technique used E. coli tRNA Sec with a mutated anticodon and an engineered E. coli strain, C321.∆A [154].The C321.∆A E. coli strain has all UAG stop codons mutated to UAA and a deletion of Release Factor 1 (RF1) [158], which terminates translation at UAG stop codons.By mutating the anticodon of tRNA Sec to CUA, tRNA Sec was capable of inserting Sec at UAG codons in the C321.∆AE. coli strain without the need for SECIS [154].However, this technique resulted in a heterogenous mix of protein products due to the mis-incorporation of glutamine and lysine at the UAG codon [154] and codon skipping at UAG [159], producing proteins lacking Sec [154,159].
A novel SECIS-independent approach to selenoprotein production was achieved by creating a chimera of tRNA Ser and tRNA Sec that interacts with EF-Tu instead of SelB [156] (Figure 4B).SelB typically binds Sec-tRNA Sec to bring it to the ribosome by binding SECIS (Figure 2A), while EF-Tu is responsible for bringing all other elongator tRNAs to the ribosome (Figure 1B).A synthetic tRNA (tRNA UTu ) was created by combining the backbone of tRNA Ser with the acceptor helix of tRNA Sec [156] (Figure 4B).The tRNA UTu is a competent substrate for both SerRS and SelA [156,157].This allowed tRNA UTu to be charged with Ser by SerRS, followed by conversion of Ser-tRNA UTu to Sec-tRNA UTu by SelA, while being delivered to the ribosome by EF-Tu instead of SelB, eliminating the need for a downstream SECIS [156] (Figure 4B).The initial version of tRNA UTu resulted in ~30% misincorporation of Ser, due to the incomplete conversion of Ser-tRNA UTu to Sec-tRNA UTu by SelA [156].Further engineering of tRNA UTu improved the interaction with SelA, producing tRNA UTuX , which was capable of stoichiometric incorporation of Sec without contamination with Ser [157].
An identity element in tRNA Sec for E. coli SelA is a non-canonical 13-branch structure [160], consisting of the acceptor and T-stems.The E. coli tRNA Sec , tRNA Utu , and tRNA UTuX all have a non-canonical 13-bp branch [161], which is efficiently recognized by SelB [162].In contrast, EF-Tu mediated translation is optimal with tRNAs with the canonical 12-bp branch structure [163].A new system for SECIS-independent selenoprotein production was developed using tRNAs with a 12-bp branch structure to improve compatibility with EF-Tu [161].The system uses SelA from Aeromonas almonicida subspecies pectinolytica 34mel (AsSelA) [161], which recognizes a 12-bp tRNA branch structure [30] along with a mutant Allo-tRNA [161].
Allo-tRNAs are a recently discovered group of tRNAs with unusual cloverleaf structures [164,165].An allo-tRNA Ser was identified in metagenomic databases that also included the identity elements for SelA [161].Expression of AsSelA and a UAG-decoding tRNA variant (allo-tRNA Utu1 ) in E. coli promoted the translation of 5 UAG codons as Sec in one polypeptide [161].Allo-tRNA Utu1 was mutated to include a segment of the A. salmonicida tRNA Sec D-stem (allo-tRNA UTu1D ), leading to increased conversion from Ser-allo-tRNA UTu1 to Sec-allo-tRNA UTu1 by AsSelA.AsSelA was also engineered by mutagenesis and directed evolution to increase the efficiency with which AsSelA converts Ser-tRNA UTu1D to Sec-tRNA UTu1D , producing AsSelA Evol .To further increase the supply of selenium, Aeromonas almonicida SelD (AsSelD) and selenocysteine lyase SufS(C364A) were co-expressed, further increasing the efficiency of Sec insertion [161].A single plasmid (pSecUAG-Evol2) encoding AsSelA Evol , tRNA UTu1D , AsSelD, and Sec lyase SufS(C364A) allowed the production of selenoprotein with a Sec incorporation stoichiometry of 90% [161].A recent study demonstrated that site-specific Sec incorporation increased the O 2 tolerance of a hydrogenase enzyme [166], and a review suggested many applications for this approach in generating Sec-containing hydrogenase enzymes for hydrogen production to meet the needs for clean and renewable sources of energy [167].

Genetically Encoded Sec in Yeast
Most approaches for producing recombinant selenoproteins rely on E. coli as a production system.Some selenoproteins cannot be efficiently produced in bacteria [157].To overcome this barrier, recent efforts engineered yeast to genetically encode Sec.Since yeast and related species lack the Sec-decoding trait, these approaches provide the first routes to produce selenoproteins in yeast.
One approach used a mutant form of an orthogonal leucyl-tRNA synthetase (LeuRS) /tRNA Leu pair that is capable of inserting a photocaged Sec (4,5-dimethyloxy-2-nitrobenzyl Sec, DMNB-Sec) in response to UAG codons to produce DMNB-Sec-containing proteins in yeast [168].Following purification of DMNB-Sec-containing proteins, the DMNB caging group was removed by exposure to UV light, producing selenoprotein [168].A second approach adapted the Sec incorporation machinery from bacteria for SECIS-independent insertion of Sec in response to UAG codons in Saccharomyces cerevisiae [169].A S. cerevisiae tRNA Ser was mutated to create a synthetic tRNA (SctRNA Sec ) that was aminoacylated by the endogenous S. cerevisiae SerRS and then converted from Ser-SctRNA Sec to Sec-SctRNA Sec by Aeromonas salmonicida SelA (AsSelA).The anticodon of SctRNA Sec was also mutated to decode UAG codons [169].An A. salmonicida SelD (AsSelD) was also used to create selenophosphate from selenite, the selenium donor for the conversion of Ser-SctRNA Sec to Sec-SctRNA Sec by AsSelA [169].Additionally, selenite is converted to free Sec in S. cerevisiae via the Cys biosynthetic pathway [170], so a Mus musculus Sec lyase (MmSCL) was added to convert free Sec back to selenite to ensure sufficient selenite availability for AsSelD to produce selenophosphate [169].A single plasmid expressing SctRNA Sec , AsSelD, AsSelA, and MmSCL is sufficient to insert Sec at UAG codons, permitting selenoprotein production in S. cerevisiae, without the need for SECIS or SelB, which enabled the successful production of Sec-containing human selenoprotein, MsrB1, in yeast [169].

Codon Availability: A Limitation of Genetic Code Expansion
The availability of codons that can be reassigned to ncAAs remains a limiting factor for GCE, making the production of proteins simultaneously containing two or more ncAAs challenging or inefficient.GCE has mostly focused on using one of the three stop codons (UAG, UAA, or UGA) [171].By mutating the tRNA Pyl anticodon, the PylRS/tRNA Pyl system can reassign any of the stop codons [172].Indeed, the production of a single protein containing two ncAAs incorporated via GCE simultaneously in E. coli was achieved using the Pyl system to reassign UAA stop codons and using the M. jannaschii tyrosyl-tRNA synthetase (TyrRS) system to reassign UAG stop codons [173].The approach used three stop codons in the ORF with two of the three stop codons being used to insert ncAAs, and the third (UGA) to direct translational termination.
So far, up to three different ncAAs have been simultaneously inserted site-specifically into a single polypeptide chain in E. coli by either reassigning all three stop codons [174] or by reassigning two stop codons and using UAU as a start codon (instead of AUG) to insert a ncAA at the N-terminus of a protein using an engineered initiator tRNA [175].To reassign all three stop codons, a PylRS/tRNA Pyl UUA pair decoded UAA to Boc-lysine (BocK), E. coli tryptophanyl-tRNA synthetase (TrpRS)/tRNA Trp UCA decoded UGA to 5-hydroxytryptophan, and M. jannaschii TyrRS/tRNA Tyr CUA decoded UAG to p-azidophenylalanine in the engineered E. coli strain ATMW1 [174].The ATMW1 strain has the endogenous E. coli TrpRS/tRNA Trp substituted with its yeast counterpart to allow the insertion of ncAA with the E. coli TrpRS/tRNA Trp pair [176].Translational termination was achieved by multiple consecutive stop codons even in the presence of an orthogonal pair capable of decoding the stop codon [177].Three consecutive UAA stop codons were used to direct translational termination, and a self-cleaving tag was used to remove partial ncAA insertion by the PylRS/tRNA Pyl UUA at the C-terminal translational termination site [174].Drawbacks to this system include the lack of a dedicated stop codon resulting in the need for a complicated self-cleaving tag to ensure C-terminus homogeneity [174] and the need for an engineered E. coli strain [174] which may be difficult to duplicate in other host systems.Another challenge is the low yield of ncAA-containing proteins, e.g., proteins containing three ncAAs were produced with only 2% of the yield of wildtype protein [174], which may result from competition of the orthogonal tRNAs with endogenous release factors, or toxicity from elongated endogenous proteins produced due to endogenous stop codon readthrough, as observed in a GCE system for phosphoserine [178].

Sense Codon Reassignment
Due to the degeneracy of the genetic code, it is widely believed that many sense codons, perhaps 20 or more, should be available for recoding or reassignment to ncAAs [171].Indeed, the Pyl system was used in an attempt to reassign the CGG arginine (Arg) codon in Mycoplasma capricolum [171].The CGG codon in M. capricolum has been called an unused or "open" sense codon [179] because the genome only contains six CGG arginine (Arg) codons and lacks a tRNA dedicated to decoding CGG [171].Unfortunately, the expression of PylRS and tRNA Pyl with a CCG anticodon (tRNA Pyl CCG ) in M. capricolum resulted in the loading of tRNA Pyl CCG with Arg by endogenous ArgRS and decoding of CGG as Arg [171].The tRNA Pyl variants with CCG anticodons are aminoacylated in E. coli by ArgRS while ArgRS was not active with tRNA Pyl CUA or tRNA Pyl GAG [171], suggesting that cross-reactivity with endogenous translation machinery may prove to be an additional barrier when attempting to decode sense codons.A comprehensive study of sense codon reassignment in E. coli found that orthogonal M. jannaschii TyrRS [180] and M. bakeri PylRS [181] pairs could effectively outcompete many sense codons, providing up to 65% missense suppression of the Arg AGG codon with an ncAA.
The E. coli Sec insertion system provides the molecular machinery to bypass barriers to installing ncAAs at sense codons.An interesting aspect of the E. coli Sec insertion system is the ability to change the meaning of sense codons.By simply changing the tRNA Sec anticodon, Sec can be inserted at 58 of the 64 possible codons with some efficiency, and can completely convert all three stop codons and 15 different sense codons to encode Sec [149].Fascinatingly, in nature, some species use Cys codons and other sense codons for recoding to Sec [30].By combining the Sec system with other OTSs, like the PylRS system, many more ncAAs could be inserted into a single polypeptide chain.

Combining the E. coli Sec Insertion System with the PylRS/tRNA Pyl System
The Sec insertion system was combined with the PylRS/tRNA Pyl system to produce a protein with 22 amino acids, including two ncAAs, Sec and acetyl-lysine (acK), in E. coli.A Sec-containing human TrxR1 site-specifically acetylated at experimentally identified acetylation sites was produced recombinantly in E. coli [151].A bacterial SECIS was included in the 3 ′ -UTR of TrxR1 to direct the E. coli Sec insertion machinery to recode a UGA codon to Sec (Figure 4A), while simultaneously reassigning UAG codons to acK using a mutant PylRS/tRNA Pyl pair [151].Biochemical characterization of the acetylated, Sec-containing TrxR1 variants demonstrated that acetylation increased TrxR1 activity by destabilizing low-activity tetramers [151].The work demonstrated that the E. coli Sec insertion system is compatible with the PylRS/tRNA Pyl pair and could be used to study the post-translational modifications of selenoproteins.
Recently, a related approach was established to generate selenoproteins using the nonsense suppressor Allo-tRNA UTu1D system to install Sec at any desired position in combination with the acK-specific PylRS/tRNA Pyl pair to produce site-specifically acetylated selenoproteins [182].Fascinatingly, the UGA codon was used to encode AcK and UAG was used for Sec in initial experiments; however, swapping the codon assignments and using UAG for acK and UGA for Sec produced a more selective dual incorporation system.The method was applied to generate acetylated variants of the human GPx1.In the future, combining the E. coli Sec insertion system and the PylRS/tRNA Pyl pair with other OTSs, such as the M. jannaschii TyrRS/tRNA Tyr pair, could allow the incorporation of many more ncAAs into proteins.

Challenges in Selenoprotein Overexpression in Mammalian Cells
Overexpression of selenoproteins in mammalian cells from plasmids can be difficult due to the complicated and inefficient Sec insertion machinery.Overexpression of TrxR1 can also cause increased cell death in mammalian cells.TrxR1 overexpression from stable plasmid transfection caused more than double the amount of cell death in Michigan cancer foundation 7 (MCF-7) cells relative to an empty vector control [183].TrxR1 lacking the Sec residue, due to a translational termination at the Sec-encoding UGA, is toxic to A549 cells and resulted in cell death when delivered by lipid-based transfection methods [184].Selenium compromised TRxR-derived apoptotic proteins (SecTRAPS) [185], which are TrxR proteins either lacking the Sec residue or that have had the Sec residue derivatized with chemical compounds, may explain these results.SecTRAPs can also be created in mammalian cells by inhibition of endogenous TrxR1 with compounds that target the Sec residue, which also leads to apoptosis [186].Transient transfection of plasmids containing DIO1, DIO2, or DIO3, with their corresponding SECIS sequences in the 3 ′ -UTR, allowed successful production of full-length DIO1 and DIO3, but not DIO2 in HEK 293T cells [187].Interestingly, relatively low levels of Sec-containing DIO1 and DIO3 were produced compared to HEK 293T cells transfected with plasmids containing Sec-to-Cys mutants, and high levels of DIO1 and DIO3 truncated at the UGA codon were observed [187].Additionally, no experiments to validate Sec incorporation into the full-length DIO proteins were conducted.Thus, transient transfection approaches for DIO1 were not toxic, but they were inefficient or unable to produce the selenoproteins.

Opportunities to Investigate Selenoproteins in Mammalian Cells
Overexpression of other selenoproteins in mammalian cells has also proven challenging.Overexpression of the selenoprotein GPx1 from a plasmid in endothelial cells required co-transfection of SelD and tRNA Sec [188], while overexpression of GPx3 in mammalian cells required co-transfection of SelD, tRNA Sec , and SBP2 [189].A plasmid developed for mammalian expression of selenoproteins (pCI-HHT-Toxo-SECIS vector) contains a highly efficient SECIS from Toxoplasma gondii in the 3 ′ -UTR of the selenoprotein gene of interest and also co-expresses SBP2 [190].The plasmid has been used for the expression of seleno-protein S [113], selenoprotein K [114], selenoprotein O [110], and GPx4 [191] in mammalian cells.Each of these techniques required overexpression of SBP2 or other proteins involved in Sec insertion, which may alter the cellular phenotype, potentially complicating any observations related to the activity of the selenoprotein itself.

Cell-Penetrating Peptide for Delivery of the Selenoproteins TrxR1 to Live Cells
Cell-penetrating peptides (CPPs) are small peptides that cross cell membranes and allow the delivery of attached cargo, such as mRNAs [192], proteins [193], and small molecules [194], to mammalian cells [195].One such CPP is derived from the human immunodeficiency virus protein, transactivator of transcription (TAT) [196][197][198].The TAT protein has a small basic domain which transverses cell membranes to achieve cellular uptake of the covalently attached cargo molecules [199].The TAT-tag has been used to deliver recombinant proteins into various types of mammalian and plant cells [199].
Recently, Sec-containing human TrxR1 that was produced recombinantly in E. coli with GCE was delivered to the cytoplasm of mammalian cells using an N-terminal TAT-tag [200].Sec-insertion into TrxR1 was achieved in E. coli using a bacterial SECIS in the 3 ′ -UTR to direct Sec insertion at UGA, with a TAT-tag fused to the N-terminus [200] (Figure 4A).Following purification of TAT-TrxR1 and incubation with HEK 293T cells, a live-cell and TrxR-specific activity reporter was used to confirm the successful delivery of active and Sec-containing TrxR1 to the cytoplasm of human cells without the need for any lipid-based transfection reagents (Figure 4C) [200].This new approach to delivering selenoproteins to mammalian cells with a CPP could be applied to other natural or synthetic selenoproteins.The method avoids the need to overexpress components of the Sec insertion system in the cell where the selenoprotein of interest is under investigation.Selenoproteins fused to a CPP can be produced in the E. coli or yeast GCE systems for Sec noted above.Following purification, these CPP-linked selenoproteins can be characterized biochemically and delivered directly to cells (Figure 4C) to investigate the biological function of selenoproteins in the homologous context of live mammalian cells.The schematic shows an approach that we applied to efficiently produce human thioredoxin reductase 1 (TrxR1) in E. coli with stoichiometric incorporation of selenocysteine (Sec) in the active site [151].Because the Sec codon is close to the untranslated region (UTR), a bacterial selenocysteine insertion sequence (SECIS) derived from the E. coli formate dehydrogenase gene was appended to the construct without perturbing the open reading frame.(B) For programmable and site-specific incorporation of Sec at any location in a recombinant protein that is independent of the SECIS element, a novel tRNA was designed to enable Sec incorporation using the normal elongation factor thermal unstable (EF-Tu) [156].The tRNA contains the first 7 base pairs from the acceptor stem of tRNA Sec (dark blue) transplanted in place of the first 6 base pairs in the body of tRNA Ser (cyan).The resulting tRNA UTu is aminoacylated with Ser and Ser-tRNA UTu is converted to Sec-tRNA by selenocysteine synthase (SelA) (as in Figure 2).Because the tRNA includes a mutant anticodon (5 ′ -CUA-3 ′ ), it reads the UAG stop codon to insert Sec.(C) Following the production of active human selenoproteins, we found that fusion with an N-terminal transactivator of transcription (TAT) cell-penetrating peptide tag enables efficient transduction of recombinant selenoprotein into the cytosol of human cells [200].Thus, the approach enables the synthesis of engineered proteins in an efficient production host or synthetic cell and the ability to then investigate selenoproteins in the homologous context of otherwise naive human cells.

Figure 1 .
Figure1.Schematic of protein synthesis by messenger RNA (mRNA) translation at the ribosome.The mRNA is translated into proteins at the ribosome, which catalyzes the formation of peptide bonds between successive amino acids (AAs).Amino acids are brought to the ribosome by aminoacylated tRNAs (aa-tRNAs) and the protein sequence is determined by the tRNA anticodon binding to complementary trinucleotide sequences in the mRNA, which are called codons.The aminoacyl-tRNA synthetases (aaRSs) are responsible for recognizing a cognate tRNA and the corresponding amino acid.(A) For example, seryl-tRNA synthetase (SerRS) catalyzes a two-step reaction in which serine (Ser) and adenosine triphosphate (ATP) are used to form a seryl-adenylate intermediate, which is subsequently used to ligate the Ser moiety onto tRNA Ser .SerRS recognizes tRNA Ser , shown with an AGA anticodon that decodes the UCU codon for Ser.To clearly depict the codon:anticodon duplex, the tRNA is shown in the 3 ′ to 5 ′ orientation, and the mRNA is shown in the normal 5 ′ to 3 ′ orientation.(B) Following aminoacylation, aa-tRNAs are each bound by elongation factor thermal unstable (EF-Tu) in bacteria or EF-1α in archaea and eukaryotes.During translation elongation, the complex of elongation factor and aa-tRNA is delivered to the ribosome, where correct pairing between codon and anticodon leads to the incorporation of the amino acid in the growing polypeptide chain.

Figure 2 .
Figure 2. Translation with selenocysteine (Sec) in nature.The tRNA Sec is aminoacylated with serine (Ser) by seryl-tRNA synthetase (SerRS).(A) In bacteria, selenocysteine synthase (SelA) converts Ser to Sec on tRNA Sec , while in (B) archaea and eukaryotes, Ser-tRNA Sec is phosphorylated by phosphoseryl-tRNA Sec kinase (PSTK), followed by conversion to Sec by Sep-tRNA:Sec-tRNA synthetase (SepSecS).Sec-tRNA Sec is localized at the UGA codon by a specialized elongation factor that binds an RNA hairpin loop, the selenocysteine insertion sequence (SECIS) that occurs downstream of the Sec (UGA) codon.(A) In bacteria, SECIS is present directly downstream of UGA in the open reading frame (ORF) and the tRNA Sec -specific elongation factor (SelB) binds Sec-tRNA Sec and localizes it at the UGA recoding site by also binding to SECIS.(B) In archaea and eukaryotes, SECIS is present in the 3 ′ untranslated region (UTR); SBP2 binds SECIS and to the elongation factor (EFSec), which localizes Sec-tRNA Sec at the UGA recoding site.

Figure 3 .
Figure 3. Electron flow through the thioredoxin (Trx) redox network.Electron flow through the Trx system is mediated by thioredoxin reductase (TrxR) using electrons from nicotinamide adenine dinucleotide phosphate (NADPH).TrxR reduces Trx before the oxidized TrxR is re-reduced using electrons from NADPH.Trx then reduces peroxiredoxin (Prx), or other target proteins, before oxidized Trx is reduced again by TrxR.Prx reduces reactive oxygen species (ROS) such as hydrogen peroxide (H 2 O 2 ).TrxR and Trx can also directly reduce ROS, providing the cell with a powerful ROS defence system.TrxR and Trx also act on a wide variety of other proteins with diverse functions, and the Trx system regulates proteins involved in many cellular functions.TrxR contains a highly nucleophilic selenium (Se) atom in its active site in the form of selenocysteine (Sec).Sec endows TrxR with a catalytically efficient reductase activity that is important in defending the cell against oxidative stress and in redox signalling.

Figure 4 .
Figure 4. Engineering recombinant selenoprotein biosynthesis and delivery to cells.(A) The schematic shows an approach that we applied to efficiently produce human thioredoxin reductase 1 (TrxR1) in Author Contributions: Conceptualization, D.E.W. and P.O.; validation, D.E.W. and P.O.; resources, P.O.; writing-original draft preparation, D.E.W.; writing-review and editing, P.O.; visualization, D.E.W. and P.O.; supervision, P.O.; project administration, P.O.; funding acquisition, P.O.All authors have read and agreed to the published version of the manuscript.Funding: Work in the authors' laboratory is funded by the Natural Sciences and Engineering Research Council of Canada (04282 to PO); Canada Research Chairs (232341 to PO); and the Canadian Institutes of Health Research (165985 to PO).