The Search for Cryptic L-Rhamnosyltransferases on the Sporothrix schenckii Genome

The fungal cell wall is an attractive structure to look for new antifungal drug targets and for understanding the host-fungus interaction. Sporothrix schenckii is one of the main causative agents of both human and animal sporotrichosis and currently is the species most studied of the Sporothrix genus. The cell wall of this organism has been previously analyzed, and rhamnoconjugates are signature molecules found on the surface of both mycelia and yeast-like cells. Similar to other reactions where sugars are covalently linked to other sugars, lipids, or proteins, the rhamnosylation process in this organism is expected to involve glycosyltransferases with the ability to transfer rhamnose from a sugar donor to the acceptor molecule, i.e., rhamnosyltransferases. However, no obvious rhamnosyltransferase has thus far been identified within the S. schenckii proteome or genome. Here, using a Hidden Markov Model profile strategy, we found within the S. schenckii genome five putative genes encoding for rhamnosyltransferases. Expression analyses indicated that only two of them, named RHT1 and RHT2, were significantly expressed in yeast-like cells and during interaction with the host. These two genes were heterologously expressed in Escherichia coli, and the purified recombinant proteins showed rhamnosyltransferase activity, dependent on the presence of UDP-rhamnose as a sugar donor. To the best of our knowledge, this is the first report about rhamnosyltransferases in S. schenckii.


Introduction
Currently, fungal infections are a worldwide burden to humanity, directly and indirectly. Directly, fungi cause over 1.6 million human deaths annually, similar to the tuberculosis death toll, and more than a billion infections [1,2]. Indirectly, although most fungal colonizations of plants are advantageous symbiotic relationships, devastating fungal diseases account for harvest losses of around 30% of crops of rice, wheat, maize, potatoes, and soybean, five of the top sources of food globally [2]. No less important are the effects caused to ecosystems by fungal infections, destroying entire populations of bats, amphibians, reptiles, bees, and even corals, to cite a few [2]. There is a limited set of antifungal drugs, and some of the medically approved ones are also used in crop fields, a practice that could result in the emergence of drug-resistant strains of human pathogens [3,4], such as the multidrug-resistant pathogen Candida auris [5]. Thus, there is an urgent need to find new molecular targets on fungi for better disease control.
The fungal infectious disease process requires a complex set of metabolic pathways along with its regulatory components. Numerous studies have elucidated many of the proteins directly involved in this process, giving high priority to effector molecules, named Selected protein sequences of reported eukaryotic rhamnosyltransferases from viridiplantae were downloaded from the NCBI database, and a Multiple Sequence Alignment (MSA) was constructed with the MAFFT algorithm (v.7.450) [44].
Accession numbers of used protein sequences are as follows: NP_564357. An ad hoc HMM profile was generated from the MSA of the sixteen eukaryotic rhamnosyltransferases with HMMER (v. 3.3).
Searches with PFAM/TIGRFAM HMM profiles as well as the ad hoc HMM profile were conducted on the S. schenckii proteome using HMMER (v.3.3). The S. schenckii proteome was used as the target database.
To determine putative domains present on the recovered sequences, these sequences were submitted to the sequence search tool at the PFAM database, with an E-value cut-off of 1.0.

In Silico Structural Analysis of Putative Rhamnosyltransferases Proteins
Retrieval of the closest crystalized proteins of Rht1 and Rht2 from the RCSB PDB database was performed using its advanced sequence search tool. Default parameters were used except for the "Return" value, which was changed to "Polymer Entities".
Three-dimensional modeling of Rht1 and Rht2 proteins was carried out at publicly available implementations of the Alphafold2 algorithm [39]. All default parameters were used, and no templates were selected. Pymol (v.3.2) was used to visualize the 3D protein models. Secondary structures were colored using the default color-scheme (alpha helixes red, beta sheets yellow, coils green). Confidence values for each atom were shown using the default color-scheme (traffic-light-like red-yellow-green gradient).
Alignments of 3D models and crystalized proteins (atoms superimposition) were conducted also on Pymol.
Matching scores and root mean square deviation (RMSD) values were recorded for each pair of structures.
A more detailed version of all bioinformatic methodologies and tools used can be found in Supplementary Materials Bioinformatics Methodologies: Bioinfo_Methods.docx.

Heterologous Gene Expression in Escherichia coli and Protein Purification
The open reading frames under analysis were amplified by PCR using the primer pairs 5 -GAATTCATGGGCCCGGGCAAGAACC-3 and 5 -TCTAGACTAGTCTAGTCGCAGCA GA-3 for SPSK_05538; and 5 -GAATTCATGAAGGTCCTCGTGTCCAC and 5 -TCTAGA TTACGACTGAGCACTCTCGGC for SPSK_01110 (underlined bases indicate EcoRI and XbaI sites, added to the primer sequences, to direct cloning). The 699 bp and 1731 bp amplicons were cloned into pCR2.1-TOPO (Invitrogen, Waltham, MA, USA) and then subcloned into the EcoRI and XbaI sites of pCold I (Takara Bio Inc., San Jose, CA, USA), generating pCold-RHT1 and pCold-RHT2, respectively. For the expression of recombinant genes, cells were grown in LB broth with 100 µg mL −1 of ampicillin (Sigma-Aldrich, San Luis, MO, USA) for 20 h at 37 • C and orbital shaking (200 rpm). Then, 150 mL of fresh broth contained in a 2-L Erlenmeyer flask was inoculated with 1.5 mL of the overnight culture and incubated at 37 • C until reaching an O.D.600 nm = 0.4; then, 0.1 mM of isopropyl-β-D-1-thiogalactopyranoside was added, and it was further incubated for 20 h at 20 • C. Cells were harvested by centrifuging for 20 min at 1500× g and 4 • C kept at −20 • C until use.
For protein purification, induced bacteria were washed three times with PBS, resuspended in 5 mL of the same buffer added with 0.5 mg mL −1 lysozyme (Sigma-Aldrich), and incubated for 30 min at 37 • C. Then, cells were resuspended in 10 mM Tris-HCl, pH 7.4, 1 mM EDTA, and 1% (w/v) SDS and disrupted by incubating for 30 min at 37 • C and shaking for 20 min in a vortex. The cell homogenate was dialyzed in a cellulose membrane (Sigma-Aldrich) for 24 h, concentrated in an Amicon Ultra centrifugal filter with Ultracel-3K (Sigma-Aldrich), and purified with the Talon Metal Affinity Resin (Takara Bio Inc.), as suggested by the manufacturer. Then, the protein was dialyzed and concentrated in an Amicon Ultra-4 system [49]. SDS-PAGE was used to analyze the purity of samples. Aliquots of 30 µL were placed onto 12% (w/v) gels, and protein separation was performed at 120 V for 90 min. The densitometric analysis in a ChemiDoc™MP (Bio-Rad, Hercules, CA, USA) system of silver-stained protein bands [50] was used to assess the protein purity of the recombinant proteins, while the Pierce BCA Protein Assay (Thermo Fisher Scientific) was used for protein quantification.

Rhamnosyltransferase Assays
Enzyme activity was assayed as previously described [51], using 200 ng α-1,6-mannobiose (Dextra Laboratories, Reading, UK) as the rhamnose acceptor and 500 µM UDP-L-rhamnose (Chemily Glycoscience; Peachtree Corners, GA, USA). The reaction products were analyzed by High-Performance Anion-Exchange Chromatography with Pulsed Amperometric Detection (HPAEC-PAD) with a Dionex system (Thermo Fisher Scientific), using a CarboPac PA-100 column (4.6 × 250 mm) and a linear gradient of 10-100 mM sodium acetate in 100 mM NaOH at a flow rate of 0.8 mL min −1 for 30 min. [52]. For assays including treatment with α-L-rhamnosidase, the reaction product of rhamnosyltransferase activity was mixed with 1 U enzyme (Megazyme, Bre, Ireland) and incubated for 60 min at 50 • C. The reactions were stopped by boiling for 10 min and subjected to monosaccharide separation by HPAEC-PAD using the following conditions: a CarboPac PA-200 analytical column (3 × 250 mm) with a CarboPac PA-200 guard column (3 × 50 mm) and an isocratic gradient of 3.2 mM NaOH with a flux rate of 0.15 mL min −1 for 25 min [53]. Fluorescence-assisted carbohydrate analysis (FACE) was performed essentially as previously described [54]. UDP-glucose, GDP-mannose, UDP-N-acetylglucosamine, and UDP-galactose were from Sigma-Aldrich.

Statistical Analysis
Statistical analysis was performed using GraphPad Prism 6 software. Enzyme activity assays were performed at least three times in duplicate and were analyzed with the unpaired t-test to establish statistical significance. All data are represented as the mean and standard deviation. In all cases, the significance level was set at p < 0.05.

Identification of Putative Genes Encoding for Rahmosyltranferases
When applying the rhamnosyltransferase specific PFAM and TIGRFAM profiles (PF11316 and TIGR01556) to the S. schenckii proteome, no significant results were retrieved. To overcome the limitations of using the prokaryotic HMM profile on a eukaryotic genome, we created a specific profile to search for putative eukaryotic rhamnosytransferases. For this purpose, we selected reported rhamnosyltransferases from plants (Arabidopsis thaliana, Cicer arietinum, Citrus maxima, Solanum tuberosum, and Vigna radiata) and built an HMM profile from their multiple sequence alignment (Supplementary Materials file Rhamno_Transf_Euk.hmm). We then proceeded to search the S. schenckii proteome using this HMM profile, recovering five protein sequences (SPSK_01368, SPSK_04821, SPSK_05538, SPSK_05928, and SPSK_01110) ranging in sizes from 232 aa to 1836 aa ( Table 1). Four of the five sequences were automatically annotated as glycosyltransferases, and a fifth sequence was only annotated as a hypothetical protein (Table 1). The five sequences were then scanned against the whole Pfam HMM profile database [36] to identify putative domains/motifs present. As shown in Table 2, three HMM profiles related to glycosyltransferases were identified: PF00201 (UDP-glucoronosyl and UDPglucosyl transferase), PF04101 (Glycosyltransferase family 28 C-terminal domain), and PF03033 (Glycosyltransferase family 28 N-terminal domain). The predicted positions of these putative domains on the protein sequences are shown in Table 2, as well as the confidence levels expressed as individual and conditional E-values. Although none of the HMM profiles were predicted for all five sequences, all of them are part of the Pfam CL0113 (GT-B) clan of glycosyltransferases that possess two Rossmann-like folds and lack the DxD motif, contrary to typical prokaryotic GT-A rhamnosyltransferases. Therefore, these five putative genes were selected for further analysis.

Expression Analysis of Five Putative Rhamnosyltransferases on Sporothrix Schenckii
To assess whether these sequences are indeed true genes, we first analyzed expression in different growth conditions. The gene encoding for ribosomal protein L6 and the filament growth in YPD broth were used as the reference gene and condition, respectively, for gene expression analysis [45]. The SPSK_05538 and SPSK_01110 sequences were found to be expressed in yeast-like cells growing in YPD broth, indicating that these are indeed genes ( Figure 1). The SPSK_01110 gene was found to be overexpressed in these growing conditions when compared to the SPSK_05538 gene, and a similar trend was observed in yeast-like cells growing in BHI or collected from infected G. mellonella larvae at day three post inoculation ( Figure 1). Interestingly, the expression levels of both genes were significantly reduced when cells were grown in Vogel medium, which favors the growth of both mycelia and swollen yeast-like cells [45], and this expression was even lower in cells grown in PDB, a medium that precludes the morphological transition from filament cells to yeast-like cells [45] (Figure 1). The sequences SPSK_05928, SPSK_01368, and SPSK_04821 showed minimal expression levels in all the tested conditions ( Figure 1), suggesting that their expression was not strong and influenced by the dimorphism. It has been previously reported that cell wall rhamnose levels are influenced by the dimorphism, being higher in yeast-like cells than in germlings [14]; therefore, in this work, we decided to continue working with the only two genes showing gene regulations by dimorphism and growth conditions, SPSK_05538 and SPSK_01110. These genes were renamed Rhamnosyltransferase 1 (RHT1) and Rhamnosyltransferase 2 (RHT2), respectively.

Figure 1.
Analysis of gene expression. Cells were grown in YPD, BHI, Voguel, or PDB medium at 37 °C, and total RNA was extracted; cDNA was synthesized with oligo(dT) primer (20 mer) and quantified by RT-qPCR. Alternatively, Galleria mellonella larvae were inoculated with 1 x 10 6 yeastlike cells, and, after 72 h of interaction at 37 °C, fungal cells were retrieved and used for total RNA extraction. Expression levels were normalized using the gene encoding for the ribosomal protein L6 as the control and the fungal growth in YPD at 28 °C as reference conditions. Results are means ± SD of three independent experiments performed by duplicate. * p < 0.05 when compared to the expression levels of the same gene in the other growing conditions.

Sporothrix schenckii RHT1 and RHT2 Expression in Escherichia coli and Characterization of the Enzyme Activity
Next, the coding sequences of RHT1 and RHT2 were cloned into the expression vector pCold I, generating pCold-RHT1 and pCold-RHT2, respectively. These constructions were used to transform E. coli cells, and gene expression was induced with isopropyl-β-D-1-thiogalactopyranoside, with incubation at 20 °C. Several conditions for gene induction were tested in pilot experiments (not shown), and the best conditions were found at 0.1 M of the inductor and 20 h of incubation. Under these inducing conditions, a 28 kDa recombinant protein was induced in cells transformed with pCold-RHT1, a molecular weight closer to the 25 kDa predicted molecular weight for the RHT1 encoded peptide ( Figure 2A). For the case of cells transformed with pCold-RHT2, a protein band of 65 kDa was generated when grown under inducing conditions ( Figure 2B), and this molecular weight was similar to 62 kDa, the predicted molecular weight for the product of RHT2. Since the expression vector contained sequences for molecular tags that are part of the recombinant proteins, such as 6xHis [49], it is likely that these extra sequences were responsible for the discrepancy between the experimental and predicted molecular weights. Cells growing in non-inducive conditions or transformed with the empty pCold I vector did not show the differentially expressed protein bands (Figure 2A-C), suggesting that those protein bands of 28 and 65 kDa were the recombinant (r)Rht1 and rRht2, respectively. Since both proteins contained the 6xHis tag at the N-terminal end, protein purification was performed by affinity chromatography in a cobalt-charged resin. Following this strategy, one single protein band was observed in the purified fractions, with the corresponding 28 and 65 kDa for rRHT1 and rRHT2, respectively ( Figure 2D,E). The densitometric analysis of the silver-stained gel subjected to SDS-PAGE confirmed that more Figure 1. Analysis of gene expression. Cells were grown in YPD, BHI, Voguel, or PDB medium at 37 • C, and total RNA was extracted; cDNA was synthesized with oligo(dT) primer (20 mer) and quantified by RT-qPCR. Alternatively, Galleria mellonella larvae were inoculated with 1 × 10 6 yeastlike cells, and, after 72 h of interaction at 37 • C, fungal cells were retrieved and used for total RNA extraction. Expression levels were normalized using the gene encoding for the ribosomal protein L6 as the control and the fungal growth in YPD at 28 • C as reference conditions. Results are means ± SD of three independent experiments performed by duplicate. * p < 0.05 when compared to the expression levels of the same gene in the other growing conditions.

Sporothrix schenckii RHT1 and RHT2 Expression in Escherichia coli and Characterization of the Enzyme Activity
Next, the coding sequences of RHT1 and RHT2 were cloned into the expression vector pCold I, generating pCold-RHT1 and pCold-RHT2, respectively. These constructions were used to transform E. coli cells, and gene expression was induced with isopropyl-β-D-1thiogalactopyranoside, with incubation at 20 • C. Several conditions for gene induction were tested in pilot experiments (not shown), and the best conditions were found at 0.1 M of the inductor and 20 h of incubation. Under these inducing conditions, a 28 kDa recombinant protein was induced in cells transformed with pCold-RHT1, a molecular weight closer to the 25 kDa predicted molecular weight for the RHT1 encoded peptide (Figure 2A). For the case of cells transformed with pCold-RHT2, a protein band of 65 kDa was generated when grown under inducing conditions ( Figure 2B), and this molecular weight was similar to 62 kDa, the predicted molecular weight for the product of RHT2. Since the expression vector contained sequences for molecular tags that are part of the recombinant proteins, such as 6xHis [49], it is likely that these extra sequences were responsible for the discrepancy between the experimental and predicted molecular weights. Cells growing in non-inducive conditions or transformed with the empty pCold I vector did not show the differentially expressed protein bands (Figure 2A-C), suggesting that those protein bands of 28 and 65 kDa were the recombinant (r)Rht1 and rRht2, respectively. Since both proteins contained the 6xHis tag at the N-terminal end, protein purification was performed by affinity chromatography in a cobalt-charged resin. Following this strategy, one single protein band was observed in the purified fractions, with the corresponding 28 and 65 kDa for rRHT1 and rRHT2, respectively ( Figure 2D,E). The densitometric analysis of the silver-stained gel subjected to SDS-PAGE confirmed that more than 98.5% of the protein present in the preparations corresponded to the recombinant proteins. Following this procedure, we obtained a yield of 16.1 ± 3.2 µg and 19.4 ± 2.7 µg pure rRht1 and rRht2 per mL of induced culture media, respectively. than 98.5% of the protein present in the preparations corresponded to the recombinant proteins. Following this procedure, we obtained a yield of 16.1 ± 3.2 µg and 19.4 ± 2.7 µg pure rRht1 and rRht2 per mL of induced culture media, respectively. Next, to demonstrate the rhamnosyltransferase activity of the purified enzymes, we set up a detection system based on α-1,6-mannobiose as the sugar acceptor and UDP-Lrhamnose as a donor. We selected these molecules because this disaccharide has been found as a part of the natural rhamnose-containing N-linked glycans found in S. schenckii peptidorhamnomannan, and UDP-conjugated but not dTDP-rhamnose has been recently identified as the rhamnose donor in this organism [18,26]. The product reactions were then analyzed by HPAEC-PAD. Figure 3 shows the analysis of the rRht1 products. At time 0 of the incubation at 37 °C, only one major peak at 16.1 min was detected, and this corresponded to the acceptor α-1,6-mannobiose ( Figure 3A). After 30 min incubation, a second peak with a retention time of 17.8 min was also observed, and its quantity increased when reactions were incubated for 3 h ( Figure 3B,C). This result suggested that it was a product of the rhamnosyltransferase activity. To confirm that this is indeed rhamnosylated mannobiose, this enzyme product was subjected to derhamnosylation by treatment with α-Lrhamnosidase, and the released sugars were again analyzed by HPAEC-PAD to detect released monosaccharides. At time 0, no monosaccharide was detected in the preparations, indicating that all the rhamnose was bound to UDP ( Figure 3B). However, after 30 min incubation and subsequent treatment with rhamnosidase, rhamnose was detected in the samples, suggesting that this monosaccharide was released from mannobiose ( Figure  3D), and the amount of the detected monosaccharide increased in preparations incubated for 3 h with rhamnosyltransferase ( Figure 3E). In agreement, the oligosaccharide analysis of the derhamnosylated samples only showed the peak corresponding to mannobiose (data not shown). The α-L-rhamnosidase was incapable of releasing the monosaccharide from UDP (data not shown), underlining that the detected rhamnose was bound to another acceptor, i.e., mannobiose. Control reactions with denatured rhamnosyltransferase by boiling or bacteria homogenates subjected to the same purification procedure failed to generate the second peak observed in panels C and D of Figure 3. Moreover, FACE analysis showed that this enzyme product had mobility similar to a trisaccharide (data not shown). It was interesting to observe that, after 3 h of incubation, an extra peak with a Next, to demonstrate the rhamnosyltransferase activity of the purified enzymes, we set up a detection system based on α-1,6-mannobiose as the sugar acceptor and UDP-Lrhamnose as a donor. We selected these molecules because this disaccharide has been found as a part of the natural rhamnose-containing N-linked glycans found in S. schenckii peptidorhamnomannan, and UDP-conjugated but not dTDP-rhamnose has been recently identified as the rhamnose donor in this organism [18,26]. The product reactions were then analyzed by HPAEC-PAD. Figure 3 shows the analysis of the rRht1 products. At time 0 of the incubation at 37 • C, only one major peak at 16.1 min was detected, and this corresponded to the acceptor α-1,6-mannobiose ( Figure 3A). After 30 min incubation, a second peak with a retention time of 17.8 min was also observed, and its quantity increased when reactions were incubated for 3 h ( Figure 3B,C). This result suggested that it was a product of the rhamnosyltransferase activity. To confirm that this is indeed rhamnosylated mannobiose, this enzyme product was subjected to derhamnosylation by treatment with α-L-rhamnosidase, and the released sugars were again analyzed by HPAEC-PAD to detect released monosaccharides. At time 0, no monosaccharide was detected in the preparations, indicating that all the rhamnose was bound to UDP ( Figure 3B). However, after 30 min incubation and subsequent treatment with rhamnosidase, rhamnose was detected in the samples, suggesting that this monosaccharide was released from mannobiose ( Figure 3D), and the amount of the detected monosaccharide increased in preparations incubated for 3 h with rhamnosyltransferase ( Figure 3E). In agreement, the oligosaccharide analysis of the derhamnosylated samples only showed the peak corresponding to mannobiose (data not shown). The α-L-rhamnosidase was incapable of releasing the monosaccharide from UDP (data not shown), underlining that the detected rhamnose was bound to another acceptor, i.e., mannobiose. Control reactions with denatured rhamnosyltransferase by boiling or bacteria homogenates subjected to the same purification procedure failed to generate the second peak observed in panels C and D of Figure 3. Moreover, FACE analysis showed that this enzyme product had mobility similar to a trisaccharide (data not shown). It was interesting to observe that, after 3 h of incubation, an extra peak with a retention time of 19.2 min was generated in the reactions ( Figure 3D). Once again, upon derhamnosylation, this peak disappeared from the preparations, and it was shown that only mannobiose and free rhamnose was detected, and the FACE analysis was suggested to be a tetrasaccharide (data not shown and Figure 3 E). Similar results were obtained with rRht2 (not shown).
Collectively, this strategy showed that both enzymes were capable of transferring rhamnose from UDP to the acceptor α-1,6-mannobiose. retention time of 19.2 min was generated in the reactions ( Figure 3D). Once again, upon derhamnosylation, this peak disappeared from the preparations, and it was shown that only mannobiose and free rhamnose was detected, and the FACE analysis was suggested to be a tetrasaccharide (data not shown and Figure 3 E). Similar results were obtained with rRht2 (not shown). Collectively, this strategy showed that both enzymes were capable of transferring rhamnose from UDP to the acceptor α-1,6-mannobiose. The analysis under the curve allowed us to generate quantitative data from these assays, and we found that rRht1 and rRht2 showed a specific activity of 198.7 ± 34.4 and 215.4 ± 28.6 µg trisaccharide min −1 mg protein −1 (Table 3). Mock reactions with no sugar donor or with bacterial extract subjected to the same purification procedure gave threshold values (Table 3). In addition, the enzyme activity was minimal when the heat-denatured enzyme was included in the reactions (Table 3). Next, to assess the specificity of the enzyme for the sugar donor, UDP-rhamnose was substituted by equimolar concentrations of UDP-glucose, GDP-mannose, UDP-N-acetylglucosamine, and UDP-galactose. None of these molecules were recognized by rRht1 or rRht2 as sugar donors in the enzyme reaction ( Table 3), suggesting that both enzymes are specific for UDP-rhamnose. The analysis under the curve allowed us to generate quantitative data from these assays, and we found that rRht1 and rRht2 showed a specific activity of 198.7 ± 34.4 and 215.4 ± 28.6 µg trisaccharide min −1 mg protein −1 (Table 3). Mock reactions with no sugar donor or with bacterial extract subjected to the same purification procedure gave threshold values (Table 3). In addition, the enzyme activity was minimal when the heat-denatured enzyme was included in the reactions (Table 3). Next, to assess the specificity of the enzyme for the sugar donor, UDP-rhamnose was substituted by equimolar concentrations of UDPglucose, GDP-mannose, UDP-N-acetylglucosamine, and UDP-galactose. None of these molecules were recognized by rRht1 or rRht2 as sugar donors in the enzyme reaction ( Table 3), suggesting that both enzymes are specific for UDP-rhamnose.

In Silico Structural Analysis of Sporothrix Schenckii Rht1 and Rht2 Proteins
To further characterize the two identified and proven rhamnosyltransferase genes, we conducted a structural analysis of the two predicted proteins. De novo tridimensional folding of the proteins was conducted through the Alphafold2 algorithm [39], and the best predicted models were recovered for each protein sequence. The predicted 3D models of both proteins Rht1 and Rht2 ( Figure 4A,B, respectively) showed the characteristic α/β/α Rossmann nucleotide binding domains [55,56], but Rht1 appears to have a single one, while Rht2 has the two Rossmann-fold characteristics of GT-B glycosyltransferases. Confidence levels per residue for predicted Rht1 and Rht2 protein 3D structures, expressed as predicted LDDT (pLDDT), are visually represented as color-coded renderings of the protein predictions on panels A and B, respectively, and graphed per position on panels C and D, respectively, of the Supplementary Materials Figure S1. For both predictions, most of the secondary structure conforming the α/β/α Rossmann domains show pLDDT confidence values > 90, which were expected to be modeled to high accuracy. Although some short regions had pLDDT values between 70 and 90, these were still expected to be modeled well, and represented a generally good backbone prediction. Regions for Rht2 with low pLDDT values (<30) corresponded to the flexible linker between the two Rossmann domains. Additionally, the predicted model for Rht1 had low confidence values at both the N-terminal and C-terminal ends (Supplementary Materials Figure S1C,D).
As a mean to corroborate the correctness of the modeled 3D structures, we aimed to compare those models to reported crystalized structures. For this, we searched the Protein Data Bank (RCSB PDB) for crystal structures of known rhamnosyltransferases, as well as for crystal structures of proteins that shared high sequence similarity to use as templates to compare those structures to the predicted 3D models of Rht1 and Rht2. At amino acid level, Rht1 shares its closest similarity to CalG1 (PDB ID 3OTG; [57]), a GT-B fold Calicheamicin Glycostyltransferase (Sequence Identity: 21%, E-Value: 0.000002912, Region: 21-411) from the Actinomycetota Micromonospora echinospora. Correspondingly, the sequence of Rht2 is most similar to that of Alg13 (PDB ID 2JZC; [58]) the sugar donor subunit of Saccharomyces cerevisiae GT-B fold N-acetylglucosamine transferase (Sequence Identity: 32%, E-Value: 5.177e-12, Region: 32-180). When the 3D models of Rht1 and Rht2 were aligned (atoms superimposed) to their corresponding closest sequence, the Rht1/CalG1 pair showed a 63 matching score and a 16.398 root mean square deviation (RMSD), while the Rht2/Alg13 pair had a 43 matching score and a 5.891 RMSD. We also selected the crystal structure of Arabidopsis thaliana rhamnosyltransferase UGT89C1 (PDB ID 6IJA; [59]). At the level of amino acid sequence, UGT89C1 does not share significant identity to neither Rht1 nor Rht2. Nevertheless, when its crystal structure was aligned to those of the predicted 3D models, the Rht1/UGT89C1 pair had a 58.5 matching score and a 7.799 RMSD while the Rht2/UGT89C1 pair had a 77 matching score and a 15.577 RMSD, both pairs having a better fit than when compared to their closest sequences ( Figure 4C,D, respectively) (higher matching scores and lower RMSD are better). It is noteworthy that Rht2 has a global similarity to the structure of UGT89C1, while Rht1 has a good fit but only to one of the two Rossmann-like folds.
has a global similarity to the structure of UGT89C1, while Rht1 has a good fit but only to one of the two Rossmann-like folds.

Discussion
The identification and classification of GTs is a complicated matter. These enzymes can be extremely specific on both the sugar donor and the acceptor molecule, but in many cases, the diversity in their sequences is not directly correlated to their specificity. Although there are only three known tridimensional folds for GTs (GT-A, GT-B, or GT-C) [55,56], these are multidomain proteins that show great diversity in their primary structures. Despite catalyzing closely related reactions, many of these transferases show little apparent sequence homology. The CAZY database classifies them into families (indicated as GTx) by amino acid sequence similarities, but the specificity of function and sequence homology for each family can be remarkably diverse [60,61]. The sequence diversity of these enzymes has been classified in at least 114 families (April 2022). Some families are monospecific for function (e.g., GT3 or GT19) and group just a small number of sequences with high amino acid similarities that include the totality of their catalytic domain. On the contrary, the shared amino acid homology in polyspecific families (e.g., GT1, GT2, or GT4) is limited to just isolated motifs of the catalytic domain, grouping a diverse and wide array of functions, such as cellulose and chitin synthase, glucosyltransferase, mannosyltransferase, rhamnosyltransferase, galactosyltransferase, and many others. Regardless of this variability in their primary structure, the three-dimensional fold of the enzymes within each family is expected to be of the same type (http://www.cazy.org/GlycosylTransferases.html accessed on: 25 April 2022): all members of family GT1 have a GT-B fold, those of family GT2 present a GT-A fold, and Glycosyltransferasases of family GT22 have a GT-C fold, to cite some examples [60,61]. It is worth mentioning that there are rhamnosyltransferases both in families GT1 (GT-B fold) and GT2 (GT-A fold) [60,61].

Discussion
The identification and classification of GTs is a complicated matter. These enzymes can be extremely specific on both the sugar donor and the acceptor molecule, but in many cases, the diversity in their sequences is not directly correlated to their specificity. Although there are only three known tridimensional folds for GTs (GT-A, GT-B, or GT-C) [55,56], these are multidomain proteins that show great diversity in their primary structures. Despite catalyzing closely related reactions, many of these transferases show little apparent sequence homology. The CAZY database classifies them into families (indicated as GTx) by amino acid sequence similarities, but the specificity of function and sequence homology for each family can be remarkably diverse [60,61]. The sequence diversity of these enzymes has been classified in at least 114 families (April 2022). Some families are monospecific for function (e.g., GT3 or GT19) and group just a small number of sequences with high amino acid similarities that include the totality of their catalytic domain. On the contrary, the shared amino acid homology in polyspecific families (e.g., GT1, GT2, or GT4) is limited to just isolated motifs of the catalytic domain, grouping a diverse and wide array of functions, such as cellulose and chitin synthase, glucosyltransferase, mannosyltransferase, rhamnosyltransferase, galactosyltransferase, and many others. Regardless of this variability in their primary structure, the three-dimensional fold of the enzymes within each family is expected to be of the same type (http://www.cazy.org/GlycosylTransferases.html, accessed on 25 April 2022): all members of family GT1 have a GT-B fold, those of family GT2 present a GT-A fold, and Glycosyltransferasases of family GT22 have a GT-C fold, to cite some examples [60,61]. It is worth mentioning that there are rhamnosyltransferases both in families GT1 (GT-B fold) and GT2 (GT-A fold) [60,61].
Their catalytic and recognition sites are the most conserved residues, but these are just short motifs shared with most members of the GT superfamily, such as the DxD motif found on those enzymes with a GT-A fold (NCBI CDD Conserved Protein Domain cl11394; [62]). This high similarity between very small fractions of their conserved residues allows for a very broad identification of an incomplete subset of the GT sequences present in a genome by use of traditional homology searches (e.g., BLAST). In addition to recovering just a subset, homology searches prevent the discrimination between the specific families. A more powerful and flexible bioinformatic tool to identify and discriminate GTs sequences is the use of Hidden Markov Model (HMM) profiles. Presently, many specific HMM profiles for GTs are available.
Accordingly, monosaccharide L-rhamnosyltransferases are multidomain proteins that show great diversity in their primary sequences. Their catalytic and recognition sites are the most conserved residues, but these are shared with most members of the glycosyltransferase superfamily (cl11394). For the particular case of rhamnosyltransferases, HMM profiles have been built, for the most part, only from prokaryotic sequences (pfam profile PF11316). Additionally, most of the prokaryotic rhamnosyltransferases have a GT-A fold (with a DxD motif present), while those from eukaryotes (viridiplantae) have a GT-B fold (which lack the DxD motif). The application of the prokaryotic HMM profiles to eukaryotic genomes often results in no results (false negatives) or misidentified genes (false positives). Currently, the standard for detecting L-rhamnosyltransferases by HMM are the TIGR01556 (Interpro, NCBI-TIGRFAMs) and pfam11316 (PFAM) probability matrices. No functional fungal rhamnosyltransferase gene has been described and biochemically tested, to date.
Currently, there are almost 61,500 putative rhamnosyltransferase protein sequences deposited on GenBank (although the number of unique sequences is closer to 14,000). For the gamma-proteobacteria class, there are reported about 11,500 L-rhamnosyltransferase sequences, of which 2581 are unique (NCBI, April 2022). These are distributed among multiple orders of gamma-proteobacteria. Being based on the gamma-proteobacteria class, the probability matrix hmm TIGR01556 is too specific, so its use on fungi (S. schenckii specifically) has turned out negative results. In eukaryotes, on the other hand, there are only about 400 sequences reported (mostly from plants), with only nine assigned to animals and nine to fungi. Only five of the fungal sequences are unique, with the other four belonging to different strains of the same genera. In all cases, the putative fungal rhamnosyltransferases were automatically annotated using the hmm profile PF11316. This profile was designed from a prokaryotic domain of unknown function, which belongs to a family of uncharacterized sequences (564 sequences) that includes a single sequence shown to be a rhamnosyltransferase from Sphingomonas sp ATCC 5319 [60,61]. Since GTs are multidomain proteins, many of these assignments of function are spurious. For example, one of the putative rhamnosyltransferases identified with this profile in Neurospora crassa (XP_958984.1) is also identified, with higher reliability, as a putative Heterokaryon incompatibility protein using the hmm profile PF06985. A similar situation occurs with the remaining four fungal sequences, making them all unlikely true fungal rhamnosyltransferases.
In addition to the PF11316 profile, there are available hmm profiles modeled from alignments of sequences recovered from alphaproteobacteria, gammaproteobacteria, and firmicutes, as well as a general profile modeled after the DxD motif. To overcome the high rate of false positives/false negatives limitation shown by the use of those profiles in fungi, we have designed a more specific ad hoc hmm profile based on eukaryotic (viridiplantae) rhamnosyltransferase sequences. As stated elsewhere, S. schenckii is known to secrete a heavily glycosylated glycoprotein, whose glycan portion is composed of N-linked mannose and rhamnose, but the rhamnosyltransferase gene required for its glycosylation has not been identified in its genome. Applying our hmm profile to its genome, we have identified five genes that were selected as encoding putative rhamnosyltransferases.
Molecular and biochemical tests were carried out to validate the predictions. Two out of the five candidate genes showed expression under lab growth conditions. These two genes were selected for further testing, cloning them for in vitro biochemical activity assays, confirming that both genes encoded for GTs with selectivity for rhamnose.
Finally, the protein sequences of the biochemically confirmed rhamnosyltransferases were subjected to an in silico structural analysis, showing that these proteins have a tridimensional structure belonging to those of glycosyltransferases of the GT-B fold type, but with differing characteristics. Rht2 has the typical two α/β/α Rossmann nucleotide binding domains, separated by a flexible linker and lacking the DxD motif that the majority of glycosyltransferases with a GT-A fold have in its active site that helps coordinate the metal ion and the nucleotide sugar. On the other hand, Rht1 appeared to have a single Rossmann-like fold, but its overall similarity is still closer to that of GT-B glycosyltransferases than to the GT-A fold ones. A couple of explanations may be drawn from these observations; one pointing to the possibility that it works as a dimer [63], or the other being that it presents a variant single-domain Rossmann-like fold, as previously reported for some GT-B glycosyltransferases [64]. These findings clearly separate the fungal rhamnosyltransferases from those found on prokaryotic cells. To the best of our knowledge, this is the first report of a functional rhamnosyltransferase, not only for S. schenckii, but for any fungal genome. Since rhamnose is not found on humans and other animals, this discovery widens the opportunities for the design of new antifungal drugs aimed at the biosynthesis of rhamnoconjugates [65].