Characterization of Aminoacyl-tRNA Synthetases in Chromerids

Aminoacyl-tRNA synthetases (AaRSs) are enzymes that catalyze the ligation of tRNAs to amino acids. There are AaRSs specific for each amino acid in the cell. Each cellular compartment in which translation takes place (the cytosol, mitochondria, and plastids in most cases), needs the full set of AaRSs; however, individual AaRSs can function in multiple compartments due to dual (or even multiple) targeting of nuclear-encoded proteins to various destinations in the cell. We searched the genomes of the chromerids, Chromera velia and Vitrella brassicaformis, for AaRS genes: 48 genes encoding AaRSs were identified in C. velia, while only 39 AaRS genes were found in V. brassicaformis. In the latter alga, ArgRS and GluRS were each encoded by a single gene occurring in a single copy; only PheRS was found in three genes, while the remaining AaRSs were encoded by two genes. In contrast, there were nine cases for which C. velia contained three genes of a given AaRS (45% of the AaRSs), all of them representing duplicated genes, except AsnRS and PheRS, which are more likely pseudoparalogs (acquired via horizontal or endosymbiotic gene transfer). Targeting predictions indicated that AaRSs are not (or not exclusively), in most cases, used in the cellular compartment from which their gene originates. The molecular phylogenies of the AaRSs are variable between the specific types, and similar between the two investigated chromerids. While genes with eukaryotic origin are more frequently retained, there is no clear pattern of orthologous pairs between C. velia and V. brassicaformis.

Chromerid genomes can be used to trace the evolutionary transition from a photosynthetic ancestor to one of the most successful groups of eukaryotic parasites [7,13,14]. Chromerid plastid genomes differ substantially in their topology, gene content, the variability of the genes they encode, and even in the genetic code, they use [4,5,10,11,15]. Chromerids possess highly reduced photosystems, which also contain several undescribed protein subunits [16]. Complete metabolic pathways, such trace the cellular evolution because of their metabolic universality and their essential role in protein synthesis [48]. AaRSs commonly have irregular evolutionary patterns due to gene duplications, high levels of sequence divergence, and horizontal gene transfers [49], but some AaRS trees show clear ancestral relationships [48]. Since in chromerids, no Aminoacyl-tRNA synthetases are plastid or mitochondrially encoded, they are all encoded by nuclear genes, and translated in the cytosol, with posttranslational targeting to the translationally active organelles.

Gene Identification and Model Assessment
The genomes of the chromerids (C. velia and V. brassicaformis) and a closely related apicomplexan (Toxoplasma gondii) were searched for AaRSs with the BLASTp tool in the Eukaryotic Pathogen database (https://eupathdb.org) [7,47], using previously characterized AaRS amino acid sequences as queries [32]. We validated the selected gene models by checking the transcription of the 5'-end of the gene model by BLAST against the identified genes from C. velia and V. brassicaformis transcriptomes [7]. Conserved protein motifs in each of the AaRS loci were identified by the Hidden Markov model (HMM)-based tool for the AaRS coding sequence detection with a cut-off of 5 out of 10 motifs [32,50]. Applying the former analysis pipeline on the Swiss-Prot annotated AaRSs showed that this pipeline could capture and annotate 99.8% of the AaRSs [32].

Gene Identification and in Silico Localization of Gene Products
BLAST search and motif screening of the total predicted proteomes of the two chromerids (C. velia and V. brassicaformis) and their close relative, the apicomplexan T. gondii, resulted in the identification of 48, 39, and 31 AaRS genes, respectively, with some of the previous annotations corrected (Table 1 and  Supplementary Table S1A). Certainly, this is substantially less than the expected number (roughly 60) necessary for providing AaRSs to all three translationally active compartments-the cytosol, plastid, and mitochondrion-via exclusive transport mechanisms that provide only one gene product to only one intracellular location. AaRSs are generally conserved; therefore, it is unlikely that any additional enzymes were missed during our search due to sequence diversity. In the diatoms Thalassiosira pseudonana and Phaeodactylum tricornutum, and in the cryptophyte Guillardia theta, our genome search and motif screening of previously identified AaRSs revealed 17.5% misidentified genes in the diatom Thalassiosira pseudonana, 5% in the diatom Phaeodactylum tricornutum, and 7% in the cryptophyte Guillardia theta compared to Gile et al. [46] (Table 1 and Supplementary Table S1A). For example, the genes Jgi_51430 and Jgi_42226 were identified as cytoplasmic Glu-tRNA synthetases (GluRSs) in P. tricornutum and T. pseudonana by Gile et al. [46]; however, our motif search revealed that these genes contain three GluRS specific motifs in T. pseudonana and only two such motifs in P. tricornutum, while they possess six (T. pseudonana) and five (P. tricornutum) GlnRS specific motifs instead (Supplementary  Table S1A). Other examples include the plastid-localized Jgi_24163 and periplastidal Jgi_68323 proteins, identified as the α-subunit of PheRS (α-PheRS) in T. pseudonana and G. theta [46]. Again, our motif search revealed that they contain four and three motifs for T. pseudonana and G. theta, respectively, which is below the standard cut-off value [32], meaning the genes probably encode other proteins than AaRSs (Supplementary Table S1A).
The chromerids each have a different number of genes coding for AaRSs. Nine (45%) AaRSs in C. velia are encoded by three distinct genes; whereas, ten (45%) are encoded by two distinct genes. Interestingly, only GluRS was encoded by a single gene (Figure 1a and Table 1). Our premise assumes that each compartment with translational activity needs an AaRS for each amino acid. Since all AaRSs in chromerids are nuclear-encoded [7,8,15], and only nine AaRSs are present in three copies in the genome, as well as five that are targeted to all three compartments, the remaining AaRSs must be dually or even multiply targeted to the cytosol, the mitochondrion, and the plastid to satisfy the need for a complete set of AaRS for each of the three subcellular compartments in which translation takes place. Alternatively, the activity of some AaRSs could be replaced by a tRNA-dependent amino acid transformation mechanism [40]. In this study, we found that both C. velia and V. brassicaformis possess two genes coding for Glutamyl-tRNA Gln amidotransferase (Glu-AdT). Both Chromera's (cvel_23134) and Vitrella's (vbra_8651) Glu-AdT were predicted to be targeted to the plastid (Supplementary Table  S1B), in which Glu-tRNA Gln is most likely transamidated to produce Gln-tRNA Gln . Moreover, two plastid-targeted Aspartyl/Glutamyl-tRNA Asn/Gln amidotransferases (Asp/Glu-AdT) were identified in C. velia (cvel_28674 and cvel_12310), and one plastid-targeted Asp/Glu-AdT was identified in V. brassicaformis (vbra_10654). This could compensate for the lack of Asparaginyl-tRNA (AsnRS) and Glutaminyl-tRNA (GlnRS) synthetases in the plastids of both investigated chromerids (Supplementary  Table S1B), producing Asn-tRNA Asn or Gln-tRNA Gln through the transamidation of misacylated Asp-tRNA Asn or Glu-tRNA Gln , respectively.
We identified many AaRSs putatively localized to the nucleus. Since tRNAs are formed in the nucleus, these enzymes could be involved in the transport of tRNAs to the cytoplasm. The localization of AaRSs to the nucleus has been shown in numerous previous studies on nuclear aminoacylation of tRNAs [64][65][66]. Recently, at least 13 active AaRSs were found in purified mammalian cell nuclei [66,67]. In general, nucleus-targeted AaRSs are known to play a role in tRNA maturation and tRNA export control. The human nuclear enzyme, MetRS seems to be related to the biogenesis of rRNA in nucleoli, in addition to its catalytic activity in protein synthesis in the cytoplasm [66,68].
GlyRSs are a special class of Aminoacyl-tRNA synthetases because they have variable functional properties and divergent oligomeric structures [35,74]. Prokaryotic GlyRSs recognize tRNA molecules with a U73 discriminator base; whereas, the eukaryotic ones recognize them with A73 [35,39,74,75]. This suggests that the interaction between the structure and function differs between prokaryotic and eukaryotic glycation systems. The oligomeric structure of most AaRSs is conserved, and sequence comparisons reveal significant similarities. In contrast, GlyRSs show high structural divergence, which leads to the non-conservation of the oligomeric structure [35]. The subunit structure of GlyRSs is not conserved in prokaryotes: two oligomeric types of GlyRS are found in nature (see above). The α 2 type has been identified in all three kingdoms of life, while the α 2 β 2 type is only found in bacteria and chloroplasts [35].

Phylogenetic Analyses
We performed phylogenetic analyses of all 20 AaRSs from chromerid algae. Similarity networks for the whole set of Aminoacyl-tRNA synthetases reflect the tree topology of each Aminoacyl-tRNA synthetase [77] (Figure 2). Out of the 21 computed trees (PheRS has two subunits), only the β-PheRS tree fully corresponds to the classical three-domain pattern that is postulated for eukaryotic enzymes. The archaeal and eukaryotic AaRSs were well separated from the bacterial group, in agreement with previous studies [48]. The remaining trees show two or more divergent bacterial clades ( Figure 2). Furthermore, in both chromerids, endosymbiotic gene transfer events were only observed in ArgRS and TrpRS. Interestingly, for LeuRS and TyrRS there are two unrelated groups, likely the result of independent evolution of the eukaryotic and prokaryotic versions of these two enzymes (Figure 2). The remarkable dissimilarities of various regions in bacterial and eukaryotic LeuRS and TyrRS have also been observed in previous studies [48,78].
AaRSs represents an example of the true ancestral paralogous evolved from gene duplication, but it is also a Horizontal Gene Transfer (HGT) leading to pseudoparalogy [79]. Out of nine cases for which C. velia contained three genes of a given AaRS (Table 1), we identify gene duplication in seven C. velia's AaRSs (GlnRS, ValRS, AlaRS, AspRS, GlyRS, LysRS, and SerRS) while AsnRS and PheRS are more likely pseudoparalogs (Supplementary Figure S1 and File sf1). In contrast, no gene duplication was identified in V. brassicaformis's AaRSs.
Bacterial PheRS consists of an α and β subunit encoded by the pheS and pheT genes, canonically located on the same operon [80]. Mitochondrial PheRS is a fusion of the N-terminal part of the α-subunit and the C-terminal part of the β-subunit [81]. We concatenated the two subunits into one sequence. Nevertheless, we constructed phylogenetic trees for the αand β-subunits separately, to ensure that the PheRS tree topology is not affected by mixed signals from the two PheRS subunits. Analyzing the subunits individually results in the same topology as for the concatenated sequences (Supplementary Figure S1). We also constructed a phylogenetic tree from the combined sequences, representing both GlyRSs, types α 2 and α 2 β 2 . The similarity network and the phylogenetic tree confirmed that the enzymes of type α 2 β 2 are only present in bacteria, resulting in a disconnected network and the formation of a clade that is separate from enzyme type α 2 (Figure 2, Supplementary Figure S1 and File sf1). All three domains of life have the α 2 -type enzyme, including both chromerids (Figure 2, Supplementary Figure S1 and File sf1).
In general, C. velia and V. brassicaformis AaRSs show similar evolutionary patterns (Figure 1b and Table 1), with the exception of AgrRS, AsnRS, and GlyRS. Seven (14.6%) of the identified genes in C. velia and six (15.4%) genes in V. brassicaformis are mitochondrial in origin (Figure 1b and Table 1). In C. velia and V. brassicaformis respectively, 30 out of the 48 (62.5%) and 23 out of the 39 (59%) identified AaRSs originate from the eukaryotic nucleus. Six AaRSs show bacterial/organellar origin in both chromerids (Table 1). We were unable to specify the origin of three (6.2%) and two (5.1%) of the AaRSs from C. velia and V. brassicaformis, respectively (Table 1 and Supplementary File sf1). Since chromerids appear to be closely related to apicomplexan parasites, we also looked at the origins of AaRSs in the apicomplexan T. gondii. In this parasite, 21 (67.7%) of the identified AaRS genes originate from the eukaryotic nucleus, while 3 and 4 AaRSs display bacterial/organellar and mitochondrial origin, respectively. We were unable to specify the origin of two (6.4%) AaRS genes from T. gondii (Supplementary Figure S1 and File sf1). This apicomplexan and the chromerids have homologous evolutionary patterns in all of their AaRSs except GluRS, supporting the close evolutionary relationship between apicomplexans and chromerids (Supplementary Figure S1 and File sf1).

Conclusions
This study identified sequences of Aminoacyl-tRNA synthetases (AaRSs) in chromerids, along with their putative protein localization and evolutionary origins. We identified 48 and 39 AaRS genes in C. velia and V. brassicaformis, respectively, representing the full set of 20 AaRS types. Five AaRSs were predicted to be targeted to all three compartments (mitochondrion, plastid, and cytosol) by an ambiguous targeting sequence that likely leads to the dual-targeting of these enzymes in C. velia; whereas, only two such AaRSs were predicted in V. brassicaformis. We identified five (C. velia) and eight (V. brassicaformis) nucleus-localized AaRSs in the chromerids, which we assume to be first targeted to the nucleus and then to the cytoplasm. GlnRS and AsnRS were absent from the plastids of both chromerids; their activity is likely restored by tRNA-dependent amino acid transformation mechanisms.
The α 2 -type of GlyRS was identified in C. velia; whereas, both the α 2 -type and the fused (αβ) 2 -type of GlyRS were found in V. brassicaformis. We identified gene duplications of seven AaRSs (GlnRS, ValRS, AlaRS, AspRS, GlyRS, LysRS, and SerRS) in C. velia, while no AaRS gene duplication was found in V. brassicaformis. Both LeuRS and TyrRS have two disconnected eukaryotic and prokaryotic groups, which could be a result of the independent evolution of the two versions of these enzymes. In both chromerids, ArgRS and TrpRS were shown to be acquired from endosymbionts by endosymbiotic gene transfer; all other genes are of eukaryotic origin with proteins targeted to the various compartments. The tree topologies suggest that numerous gradual losses of pseudoparalogs occurred in eight enzymes (ArgRS, CysRS, IleRS, MetRS, AlaRS, HisRS, LysRS, and ThrRS).
Given that C. velia and V. brassicaformis are such closely related organisms, the number of differences in their AaRS genes is much higher than we would expect. The intracellular targeting of AaRSs is independent of their evolutionary origin. There is no clear pattern of orthologous pairs that are retained in both organisms. Instead, gene duplications, gene losses, and changes in targeting signals account for the required activities of tRNA synthesis in the different cellular compartments.