Bioinformatic Prediction of an tRNASec Gene Nested inside an Elongation Factor SelB Gene in Alphaproteobacteria

In bacteria, selenocysteine (Sec) is incorporated into proteins via the recoding of a particular codon, the UGA stop codon in most cases. Sec-tRNASec is delivered to the ribosome by the Sec-dedicated elongation factor SelB that also recognizes a Sec-insertion sequence element following the codon on the mRNA. Since the excess of SelB may lead to sequestration of Sec-tRNASec under selenium deficiency or oxidative stress, the expression levels of SelB and tRNASec should be regulated. In this bioinformatic study, I analyzed the Rhizobiales SelB species because they were annotated to have a non-canonical C-terminal extension. I found that the open reading frame (ORF) of diverse Alphaproteobacteria selB genes includes an entire tRNASec sequence (selC) and overlaps with the start codon of the downstream ORF. A remnant tRNASec sequence was found in the Sinorhizobium meliloti selB genes whose products have a shorter C-terminal extension. Similar overlapping traits were found in Gammaproteobacteria and Nitrospirae. I hypothesized that once the tRNASec moiety is folded and processed, the expression of the full-length SelB may be repressed. This is the first report on a nested tRNA gene inside a protein ORF in bacteria.


Introduction
Selenocysteine is the 21st amino acid used in diverse bacteria, archaea, and eukaryotes for expressing selenoproteins [1]. In this work, I focus on the bacterial system. Unlike most of the canonical amino acids, Sec is synthesized on tRNA Sec molecules and delivered to a growing polypeptide in the ribosome by the dedicated elongation factor SelB. First, tRNA Sec is charged with serine by seryl-tRNA synthetase. Ser-tRNA Sec is then converted to Sec-tRNA Sec by selenocysteine synthase (SelA) using selenophosphate synthesized by SelD [2]. SelB binds to a Sec-tRNA Sec and a Sec-insertion sequence (SECIS) element on mRNA to mediate Sec-insertion (see Figure 1A) [3]. The SelB domains I/II/III are mainly responsible for the GTPase activity, while the domain IV composed of four winged helix domains (WHDs) is responsible for the SECIS recognition ( Figure 1A) [3,4]. In most of the Sec-utilizing bacteria, the UGA stop codon directly followed by a SECIS element is recoded in a competition with release factor 2 [5], while some bacteria use the UAG stop codon or the UGC/UGU cysteine codons together with anticodon variants of tRNA Sec [6].
The primary role of Sec in common bacteria such as Alphaproteobacteria, Betaproteobacteria, and Gammaproteobacteria species is to express Sec-containing formate dehydrogenases (FDHs) [7]. Formate metabolism is important for nitrogen fixation and formate-dependent respiration in Rhizobiales [8,9]. It is known that a megaplasmid pSymA of Sinorhizobium (Rhizobium) meliloti encode a fdoGHI (for FDH-O) and selABCD gene cluster [10,11]. In our previous study on non-canonical SelB sequences [12], it was found that Rhizobiales SelB sequences in the public databases are annotated to have a C-terminal extension compared to other bacterial SelB sequences ( Figure 1A) [13]. In the present study, a careful analysis of these Rhizobiales SelB sequences was performed to reveal the possible reason. The mechanism of the SECIS-dependent recoding of the UGA codon for Sec by the SelB•SECIS•Sec-tRNA Sec complex (modified from [3,4]). Many of the Rhizobiales SelB sequences were annotated to have a C-terminal extension, which is indicated as an additional domain appended to the C-terminus of the crystal structure picture of Aquifex aeolicus SelB composed of four domains (PDB id: 4ZU9). (B) The Azorhizobium caulinodans selC (tRNA Sec ) sequence is nested inside the ORF of the selB gene. The star indicates a translational stop signal.

tRNA Sec is Encoded Inside the selB Gene in Diverse Alphaproteobacteria
As shown in Figure 1B, it was found that an entire tRNA Sec sequence is nested inside the ORF of the selB gene of an Rhizobiales bacterium Azorhizobium caulinodans. The stop codon of the selB gene overlaps with the start codon of the next ORF. Thus, the C-terminal extension results from the translation of the tRNA Sec sequence into amino acids. Since overlapping of stop and start codons is common for polycistronic mRNAs and may support translation re-initiation [14], the annotation of the ORFs may be true. Is this common in Rhizobiales and in other orders of Alphaproteobacteria? I performed a bioinformatic analysis of selB and selC genes. In Figure 2, the distribution of the selB-selC overlapping trait is overlaid on the phylogenetic tree of SelB sequences analyzed in this study. In summary, I found several types of overlapping genes and the penetrance of the overlapping trait even outside the Alphaproteobacteria class. In Alphaproteobacteria, diverse species of Rhizobiales and a few groups of Rhodobacterales, Rhodospirillales, and Caulobacterales have a selC nested inside the selB gene ( Figure 3). In many cases, the stop codon of the selB ORF overlaps with the start codon of the next ORF. In most cases, the selC-encoded tRNA Sec sequence is 96 nucleotide residues in length or 93 residues (due to the lack of the CCA tail in the selC gene), which may facilitate the maintenance of the reading frame. Some species have lost the overlap ( Figure 3); Methylopila species have a new stop codon before the selC sequence, while some Ensifer species have a long spacer between the selB and selC genes. These results clearly suggested that the overlapping trait is highly conserved but is not essential.

tRNA Sec Remnant is Encoded inside the selB Gene in pSymA
It is known that the selAB genes and the selCD genes are separated by a transposon in S. meliloti pSymA megaplasmid [10,11]. However, the S. meliloti SelB has a C-terminal extension [13] which is slightly shorter than that of A. caulinodans SelB. It was revealed that the 5 half of the overlapping tRNA Sec sequence remains in the S. meliloti selB ORF ( Figure 4A). Thus, the shorter C-terminal extension results from the translation of the remnant tRNA Sec sequence into amino acids. The transposon insertion may have generated a new stop codon for the pSymA selB gene. A similar case was found in a lineage of Rhizobium ( Figure 4B). Rhizobium sp. NFR07 has a remnant tRNA Sec sequence in the selB gene, while Rhizobium wenxiniae has a complete tRNA Sec sequence nested in the selB gene. Roseomonas gilardii and R. mucosa also have a SelB with a shorter C-terminal extension, while their remnant tRNA Sec sequences seem to be highly degenerate ( Figure 4C). The C-terminal extension may facilitate the translational coupling of the selB ORF and the ORF2 in these Roseomonas lineages ( Figure 4C). The fdoGHI genes encode a membrane-bound formate dehydrogenase (FDH-O) carrying a catalytic Sec residue. The SelD proteins synthesize selenophosphate which is the selenium donor used by SelA. The SelD and Sel1-like proteins are not selenoproteins in these bacteria. The two ORFs associating with the overlapping selBC gene were named ORF1 and ORF2 in this study. As reported previously [12], Methylophila spp. use a tRNA Sec with a non-canonical discriminator base A73 instead of G73. Sequence information of the overlapping selBC genes were provided. The stars indicate a translational stop signal. The tRNA Sec sequences were underlined. The reading frames of ORF1 or ORF2 were italicized.

tRNA Sec Partially Overlapping with the selB Gene in Gammaproteobacteria
It was speculated that the fdoGHI-selABCD gene cluster has been transferred among the Alphaproteobacteria, Betaproteobacteria, and Gammaproteobacteria clades via horizontal gene transfer [7]. Some Gammaproteobacteria species belonging to the Rhodanobacteraceae family in the Xanthomonadales order have a SelB species resembling Rhizobiales SelBs. Thus, it was assessed whether the selB-selC overlapping trait was also transferred from Alphaproteobacteria to Gammaproteobacteria. Interestingly, a few groups of Dyella and Luteibacter have a selB gene whose stop codon lies in the middle of the overlapping tRNA Sec sequence at the same position (from nucleotide residue 46 to 48) ( Figure 5A). In contrast, their closely related strains have a new stop codon for the selB gene before the tRNA Sec sequence. Thus, the selB-selC overlapping trait may have been conserved for a period in the two Gammaproteobacteria lineages. A similar case was found in a groundwater lineage of Nitrospirae (Figure 2), although the SelB and SelC sequences differ significantly from those of Dyella and Luteibacter as well as Alphaproteobacteria. On the other hand, the overlapping traits of selB with ORF1 or ORF2 were not found outside the Alphaproteobacteria class. Rather, the distribution of the ORF1 and ORF2 genes is limited to Alphaproteobacteria.

Reversed tRNA Sec Overlapping with the selB Gene in Gammaproteobacteria
I found that the SelB sequences of Shewanella and a few Ferrimonas species have a C-terminal extension (Figure 2). Interestingly, it was revealed that the C-terminal extension results from the translation of the complementary sequence the tRNA Sec sequence into amino acids ( Figure 5B). The selC promoter (probably TTGATTcaggtttacacattttcTACTATC for Shewanella oneidensis MR-1) may exist around the stop codon of the selB gene (underlined). It is likely that the transcripts of the selB gene and the selC gene might function as the cis-encoded antisense RNAs to each other [15]. In contrast, Photobacterium damselae, another Gammaproteobacteria species, has a very similar selB-selC locus separated by a stop codon. The tRNA Sec sequences of these Gammaproteobacteria resemble alphaproteobacterial tRNA Sec sequences, indicating horizontal gene transfer. Figure 6 shows that the amino acid sequences of the C-terminal extensions of SelBs are highly conserved within the tRNA Sec region. In the forward tRNA Sec regions, the extensions start with Gly by translating the first three nucleotides "GGA" and end with Pro by translating the CCA tail or a remnant CCA tail sequence "CCN". Although the number of the encoded tRNA residues is a multiple of 3 in most cases, 95-nt tRNAs were also found ( Figure 6). In the partially overlapping tRNA Sec regions, the positions of the stop codon for the selB gene are highly conserved in the Dyella/Luteibacter species and in the Nitrospirae species ( Figure 6). On the other hand, in the reverse complement tRNA Sec regions, the extensions start with Trp by translating the complementary sequence of the CCA end "UGG" and end with Pro by translating a nucleotide triplet "CCN" that is the complementary sequence of the nucleotides at positions from −1 to +2 of the tRNA moiety.

Alignment Analysis of the selB C-Terminal Amino Acid Extensions
2.6. A New Mechanism of Maintaining Homeostasis between selB and Sec-tRNA Sec ? Figure 7 shows a proposed model of the translation regulation by the nested tRNA moiety. Once the tRNA moiety folded into a tertiary structure, the exact 5 end and the 3 trailer may be cleaved by RNases [16][17][18]. This tRNA processing will generate a nonstop mRNA for SelB and a leader-less mRNA for ORF1 or ORF2, leading to the repression of the expression of SelB and ORF1/2 proteins (Figure 7). In other words, the expressions of SelB and tRNA Sec molecules are alternatives. As discussed later, a similar mechanism was hypothesized for a mitochondrial tRNA gene nested in a protein gene [17]. Because the excess of SelB over Sec-tRNA Sec may lead to sequestration of Sec-tRNA Sec molecules due to the extraordinary high affinity [19,20], the SelB expression level should be maintained to be low but enough for mediating the UGA-recoding. It is not clear whether or how this hypothetical mechanism would be regulated under selenium deficiency.

Discussion
To the best of my knowledge, this is the first report of an entire tRNA sequence nested in the protein coding region of mRNA in bacteria, while tRNA-like structures of varying sizes and shapes have been found in the coding regions or untranslated regions of mRNAs in diverse organisms and viruses [21][22][23]. It is known that tRNA genes are partially or fully integrated within protein genes or other tRNA genes in the compact mitochondrial genome of animals [17,24]. For example, the 60-nt tRNA Lys is fully integrated within the coding region of the cox1 gene in direct orientation in the Armadillidium vulgare mitochondrial genome [17]. Since Rhizobiales bacteria have a large and redundant genome, it is unlikely that they have been pressured to evolve a compact system. Rather, they may have survived selenium deficiency in the rhizosphere or inside the host plants. Plants lack the eukaryotic Sec-inserting machinery. It is also known that the gut symbionts of higher termites feeding on dead plant material often lack the Sec-insertion machinery [25] or have a putative backup system for the Sec-insertion machinery [6]. It is noteworthy that in a lineage of such symbiotic bacteria, the tRNA Sec sequence ends at the -2 position of the start codon of the selA in the selCAB operon (3300006045.a:Ga0082212_10006574) [6]. Thus, expression level control of the SelA and SelB might be important in these symbiotic bacteria. The alternative expression of SelB and tRNA Sec can be deemed as a new approach different from the SelB autoregulation system of Escherichia coli [26] for controlling the SelB expression level in bacteria. Future studies with wet-lab experiments may elucidate these mechanisms by altering the sequences such that the protein sequence would be changed without affecting the tRNA function.

Materials and Methods
The web-based BLAST tools of NCBI and the Integrated Microbial Genomes & Microbiomes system (IMG/M: https://img.jgi.doe.gov/m/) (the last accessed date: 22 April 2021) [27] were used for the bioinformatic analyses. All sequences were manually curated. The multiple alignment analysis of SelB sequences was performed using Clustal X 2.1 [28], manually curated using Seaview 4.6.5 [29], and depicted using SnapGene 5.2.4 (GSL Biotech LLC, Chicago, IL, USA). The SelB phylogenetic unrooted tree was developed by maximum likelihood estimation with 100 replicates using MEGA-X [30] using the JTT matrix-based model (bootstrap method, uniform rates, use all sites). The phylogenetic tree was drawn