Distantly Related Homologue of UhpT in Pseudomonas aeruginosa

: Pseudomonas aeruginosa (PA) is an opportunistic Gram-negative bacteria that affects patients in intensive care units and chronic respiratory disease patients. Compared to other bacteria, it has a wide genome (around 6.3-Mb) that supports its metabolic versatility and antimicrobial resistance. Fosfomycin (FF) is primarily used as an oral treatment for urinary tract infections (UTIs). FF diffuses inside the cell via glycerol-3-phosphate transporter (GlpT) PA, as well as in other bacteria. In other bacteria, such as E. coli , glucose-6-phosphate transporter (UhpT) functions as FF transporter. Since mutant GlpT leads to FF resistant PA, it is assumed that GlpT is the only FF transporter. However, it is also assumed that PA uses glucose-6-phosphate and, thus, homologous proteins of UhpT may be present in its genome. Here, we present an attempt to ﬁnd a distant related homologue of UhpT in PA. A Hidden Markov Model (HMM) was created to seek for Major facilitator family (MFS) domain in 21 PA genomes of 14 CF patients annotated with prokka and the statistical analysis was performed (MCC: 0.84, ACC: 0.99). Then, the HMM was applied to PA genomes. Besides the actual GlpT, annotated as glpt_1, one more GlpT protein was found in 21 out of 21 genomes, annotated as glpt_2. Since glpt_2 clusters closer to UhpT than GlpT, glpt_2 was selected to build a model. Computing a structural superimposition, the model and the template of UhpT have 0.6 Å of RMSD. The model of glpt_2 has some characteristics that are fundamental to UhpT functions. The binding site, consisting of 2 arginines (Arg46 and Arg275) and Lys45, is totally conserved, as well as the topology of the structure. Asp90 is also conserved in glpt_2 model. No studies aimed at searching for distant related homologous of UhpT. Since the high genetic exchange and high mutational rate in bacteria, it is likely that PA has a UhpT-like protein in the PA genome. The binding site is superimposable to UhpT protein as well as the overall topology. In fact, the 12 TMs are completely comparable, suggesting a well-deﬁned folding of the protein across the bilayer lipid membrane. To enforce our hypothesis, in all 21 PA genomes, we also found a protein annotated as membrane sensor protein UhpC, important for expression and function of UhpT in E. coli . Since PA strains are wild-type, we can assume that most of the PA have proteins like this. The presence of a homologue of UhpT suggests that this protein is conserved in PA genome.


Introduction
Pseudomonas aeruginosa (PA) is an opportunistic Gram-negative bacteria that thrives soil and host-environment and affects patients in intensive care units and chronic respiratory disease patients, such as cystic fibrosis (CF) [1,2]. Compared to other bacteria, it has a wide genome (around 6.3-Mb) that supports its metabolic versatility. Therefore, a key point of its pathogenicity is a rapid adaptation to different and challenging environments. PA causes infections whose high mortality rate [3] is attributable to the organism's high resistance to many antimicrobials, particularly multidrug resistance in healthcare settings [4,5]. Generally, two mechanisms are responsible for antimicrobial resistance: acquisition of resistance genes (e.g., those encoding β-lactamases [6,7]) or aminoglycoside-modifying enzymes [8] from other bacteria or mutations of chromosomal genes [9]. Among all mentioned mechanisms, mutations of chromosomal genes play a crucial role in fosfomycin-resistant PA. Fosfomycin (FF) is primarily used as an oral treatment for urinary tract infections (UTIs). However, FF is under study in therapy of a variety of infections because it could be active against multidrug-resistant (MDR) bacteria [10]. For PA treatments, fosfomycin has shown its efficacy in combinations with other antibiotics [11].
Resistance to FF has been revised recently [17,18]. Breakpoints for susceptibility have been set by Clinical and Laboratory Standards Institute (CLSI) and European Committee on Antimicrobial Susceptibility Testing (EUCAST). For wild-type PA, the minimum inhibitory concentrations (MICs) is ≤128 mg/L. In PA, the main mutational resistance is mutations in the GlpT channel. A study demonstrated that missense mutations on glpt gene result in FF resistance of PA with a good fitness of FF-resistant colonies [15] assuming that GlpT may be the only channel for FF in PA. Compared to E. coli, PA utilizes a wider range of carbon sources to overcome the lack of glucose-6-P in the cell medium. This suggests the presence of a protein homologue of E. coli UhpT system in PA. However, in the same study, they tried to find a homologue protein in PA strains, since it is generally assumed that a glucose-6-P system may exist. In fact, it has been stated that glucose-6-P should be added in growth medium for FF sensitivity to allow FF efficacy, confirming that the presence of glucose-6-P may activate its channel in some PA strains [19,20].
Here we present an attempt to find an UhpT homologous 21 PA strains found in 14 cystic fibrosis patients. Generally, therapies with solely FF are rarely used.
First, the Major facilitator family (MFS, PFAM id: PF07690) was studied. MFS is a family of membrane channel proteins whose function is the transport of small molecules in or out of the cell in response to chemiosmotic gradients [21]. The proteins in this family are widely distributed in all kingdoms and are the main responsible of the transportation of sugars. However, drugs, metabolites, oligosaccharides, amino acids and oxyanions were all transported by MFS family members [22]. LacY is one of the representative permeases of this family. MFSs can function by solute uniport, solute/cation symport, solute/cation antiport and/or solute/solute antiport with inwardly and/or outwardly directed polarity. Generally, MFSs contain 12 transmembrane (TM) helices, with two 6-helix bundles formed by the N and C terminal homologous domains [23] of the transporter which are connected by an extended cytoplasmic loop (among 30 to 100 residues) that may suggest a large degree of relative motion between the two domains. Both C and N terminals are located on the cytoplasmic side. The MFSs topology is organized as follows: Generally, the movement across the membrane through MFSs is mediated by the same mechanism. In steady state, MFS has an outward conformation that reveals the inside face of the channel and the binding site. When the ligand reaches the binding site, a sudden change in conformation happens and the MFS shifts in inward conformation. Now the binding site is in communication with cytoplasm, which is rich in inorganic phosphate (Pi). Pi has a higher affinity with the binding site than the ligand. Therefore, the ligand is released in cytoplasm and the protein shifts back in outward conformation [23,24]. The function of the specific MFS lays in a specific binding site. In LacY is mainly coordinated to residues in the N domain, with Glu135, Arg144 and Trp151 [25]. In FucP, Glu135 and Gln162 are essential for galactose binding [26]. In PepT, residues Tyr29, Tyr 30 and Tyr68 are essential for peptide-binding affinity [27]. In GlpT, two arginines (Arg45 and Arg269) and a Lys46 participate at glycerol-3-P binding, which are conserved even in UhpT binding pocket [28]. Moreover, Asp388 and Lys391 are important for substrate recognition in UhpT [29].
In this study, we focused on organophosphate:phosphate antiporter family (OPA), a sub-family of MFS proteins whose function is to transport small carbohydrate molecules inside the cell to use them as carbon source. These proteins are critical for both bacteria and human. Particularly, mutations of GlpT in human cause glycogen storage disease type Ib [30]. In bacteria, many MFS proteins have been characterized and their function studied. In PA, MFS are also important. By searching for PFAM id (PF07690) and filter for Pseudomonas aeruginosa organism in UniProt database, we end up with over 3000 proteins and over 200 clusters, by using UniRef50. MFSs have an important role in many biological processes: xenobiotic detoxification as Bcr/CflA family efflux transporter, nitrate assimilation as nitrate/nitrite transporter or even as pharmacological resistance as the chloramphenicol resistance protein CmlA.

Results
The Hidden Markov Model (HMM, see Methods) was used to search for MFS proteins in 21 genomes of PA annotated with prokka. For each genome, we used the hmmsearch tool of HMMER using the HMM as input model and the PA genomes as query. The list of the proteins in output of each PA genome is in Supplemental Materials File S1. To validate our model, whole-genome sequences from NCBI of PA were downloaded, and the hmmsearch was computed. We compared the number of sequences which passed the HMM filter in PA genomes with the set of genomes from NCBI. The statistics are in Supplemental Materials File S2. On average, in the NCBI set 75.1 sequences out of 6138.6 total sequences that passed the HMM were found (1.23%). In our genomes, 74.4 sequences out of 6136.9 total sequences (1.21%) were found. The 21 PA genomes and NCBI genomes are comparable. In all 21 genomes, the GlpT protein annotated was found. Its product name is "Glycerol-3-phosphate transporter" and glpt_1 was the gene name. A pairwise alignment with UniProt ID A0A2R3IQP3 was computed, and the similarity was 99-100%. Then, we can claim that either prokka or HMM were able to identify GlpT.
However, 21 out of 21 genomes also have a protein whose product name is still "Glycerol-3-phosphate transporter", but the gene name is glpt_2. Its length is 4 residues longer. The pairwise alignment with the Needleman-Wunsch global alignment algorithm was used by using Needle from EMBOSS [31] and the sequence identity is very low (28.1%) as shown in Figure 1.
Even if the low similarity suggests that the two proteins are not the same, prokka still has annotated this sequence with the same product name. Then, we assumed that this protein may be a homologue of GlpT and may share the same function. As seen above, GlpT and UhpT belong to the same family (MFS), meaning that the protein function is conserved, although the protein sequence similarity is low. To enforce our assumption, a MSA with glpt_1 and glpt_2 and some GlpT and UhpT proteins of different organisms was computed. The list of the proteins collected is in Table 1   Even if the low similarity suggests that the two proteins are not the sam has annotated this sequence with the same product name. Then, we assu protein may be a homologue of GlpT and may share the same function. A GlpT and UhpT belong to the same family (MFS), meaning that the prot conserved, although the protein sequence similarity is low. To enforce our MSA with glpt_1 and glpt_2 and some GlpT and UhpT proteins of different computed. The list of the proteins collected is in Table 1 and the fasta seque glpt_2 and UhpC are in Supplementary Material S3.
S. typhi P08194 A0A485IC54 * P. aeruginosa A0A072Z* Four different proteins were used to compute the MSA. In order to maintain the the species, four different microorganisms which have either UhpT or GlpT were marked with * are the GlpT and UhpT of PA. The sequence identity of these two glpt_1 and glpt_2 is 100%. The identity is 28.1% and suggests that the two sequences represent two different proteins. Table 1. UniProt identifiers and organisms of GlpT and UhpT proteins used in MSA.

UniProt ID of UhpT Organism UniProt ID of GlpT Organism
Four different proteins were used to compute the MSA. In order to maintain the heterogeneity of the species, four different microorganisms which have either UhpT or GlpT were chosen. The id marked with * are the GlpT and UhpT of PA. The sequence identity of these two sequences and glpt_1 and glpt_2 is 100%.
In Figure 2, the tree obtained from MSA is reported. Clustal Omega algorithm [32] was used. In the tree glpt_2 clusters with UhpT proteins, suggesting that glpt_2 is closer to UhpT than GlpT and glpt_1, which are the actual GlpT protein in PA. Since their function and structure is conserved, the alignments show one big cluster where either GlpT or UhpT proteins are packed together. However, in this sole cluster, UhpT forms a sub-cluster where glpt_2 is included.
Therefore, glpt_2 was selected as a plausible UhpT homologue in PA. To do so, MOD-ELLER v10.2 was used to compute a homology modeling experiment. MODELLER is used for homology or comparative modeling of protein three-dimensional structures [33,34]. The user provides a well-defined protein structure and a protein sequence to compare to it. MODELLER implements comparative protein structure modeling by satisfaction of spatial restraints [35,36] and automatically calculates a model containing all non-hydrogen atoms. MODELLER has been used for homology modeling experiments and it is suitable for identifying distantly related homologues. For building a new model, a defined crystal structure of UhpT from AlphaFold repository [37,38] was retrieved. AlphaFold is a machine learning system that predicts a protein's 3D structure from its amino acid sequence with an accuracy very close to structural experiments and it is, nowadays, the best secondary structure prediction method. It is possible to search for gene or protein names in AlphaFold database and, more importantly, it reports the level of confidence predicted for each residue of the protein. Therefore, the P0AGC0 was selected as crystal structure template. This protein has the best confidence score across all the TMs helices, which are the key points of the protein functions. The only part of the protein that has a very low confidence score is the cytoplasmic loop between the C and N domains, which is very unstructured [24] and may be the reason why it was poorly predicted by AlphaFold. MODELLER produces 5 outputs and, in order to choose which is the best one, the structural superimposition by using Chimera [39] was performed for every output and the Root Mean Square Deviation (RMSD) was calculated. RMSD is the measure of the average distance between the atoms of superimposed proteins (Supplemental Materials File S4). The less is the RMSD the better is the structural superimposition. The RMSD between P0AGC0 and glpt_2 model was 0.614 Å. Both the crystal structures and the superimposition are shown in Figure 3. The sequence identity between the two structures is 24%.  Table 1. We can see that glpt_2 is closer to UhpT than GlpT.
Therefore, glpt_2 was selected as a plausible UhpT homologue in PA. To do so, MODELLER v10.2 was used to compute a homology modeling experiment. MODELL is used for homology or comparative modeling of protein three-dimensional structu [33,34]. The user provides a well-defined protein structure and a protein sequence compare to it. MODELLER implements comparative protein structure modeling  Table 1. We can see that glpt_2 is closer to UhpT than GlpT.

Discussion
The topology of the model and the UhpT crystal structure are comparable. The model has all 12 TMs conserved as in MFS and both the C and N domains end in the cytoplasmic part of the membrane. As seen in Figures 4 and 5, the binding site is highly conserved. The two arginines (Arg45 and Arg269) responsible for either GlpT and UhpT proteins and perfectly superimposable: Arg46 and Arg275 in UhpT and Arg39 and Arg264 in the model have similar distances (9.70 Å vs 9.88 Å, respectively) than crystal structure of GlpT, that is 9.9 Å [28]. Either the two lysine's are conserved as Lys45 and Lys38. Then, we can claim that binding site could function as well as in UhpT proteins.

Discussion
The topology of the model and the UhpT crystal structure are comparable. The model has all 12 TMs conserved as in MFS and both the C and N domains end in the cytoplasmic part of the membrane. As seen in Figures 4 and 5, the binding site is highly conserved. The two arginines (Arg45 and Arg269) responsible for either GlpT and UhpT proteins and perfectly superimposable: Arg46 and Arg275 in UhpT and Arg39 and Arg264 in the model have similar distances (9.70 Å vs. 9.88 Å, respectively) than crystal structure of GlpT, that is 9.9 Å [28]. Either the two lysine's are conserved as Lys45 and Lys38. Then, we can claim that binding site could function as well as in UhpT proteins.

Discussion
The topology of the model and the UhpT crystal structure are comparable. The model has all 12 TMs conserved as in MFS and both the C and N domains end in the cytoplasmic part of the membrane. As seen in Figures 4 and 5, the binding site is highly conserved. The two arginines (Arg45 and Arg269) responsible for either GlpT and UhpT proteins and perfectly superimposable: Arg46 and Arg275 in UhpT and Arg39 and Arg264 in the model have similar distances (9.70 Å vs 9.88 Å, respectively) than crystal structure of GlpT, that is 9.9 Å [28]. Either the two lysine's are conserved as Lys45 and Lys38. Then, we can claim that binding site could function as well as in UhpT proteins.   Moving out from the substrate binding, it has shown that the equivalent of Asp88 in GlpT is important for the interaction between the second and the seventh helices during inward to outward interconversion [40]. In both UhpT and model, we found the asparagine, respectively, in position 90 and 85, although they are not superimposed and are 6.57 Å apart ( Figure 6). Moreover, we found the relatively high percentage of aromatic residues in the sequence, 13.6% vs 11.9%, respectively, that tends to increase if counted only the TMs helices (15.4% vs 12.9%), whereas in GlpT is even higher (15.2% and 18.9% in whole sequence and in TMs only, respectively). However, this model presents some differences either with UhpT and with GlpT. In UhpT has been underlined the importance of Asn388 and Lys391 in substrate recognition [29]. It has been proved that these two residues form a salt bridge that is crucial in selecting the glucose-6-phosphate to the detriment of other organophosphate substrates. However, in the same paper, it is not excluded that glucose-6-phosphate still can be selected and transported by the channel. Moreover, in the organophosphate transporter family, the motif W173NXXHN178 [41] is highly conserved (>95%). This motif is found in UhpT in the Moving out from the substrate binding, it has shown that the equivalent of Asp88 in GlpT is important for the interaction between the second and the seventh helices during inward to outward interconversion [40]. In both UhpT and model, we found the asparagine, respectively, in position 90 and 85, although they are not superimposed and are 6.57 Å apart ( Figure 6). Moving out from the substrate binding, it has shown that the equivalent of Asp88 in GlpT is important for the interaction between the second and the seventh helices during inward to outward interconversion [40]. In both UhpT and model, we found the asparagine, respectively, in position 90 and 85, although they are not superimposed and are 6.57 Å apart ( Figure 6). Moreover, we found the relatively high percentage of aromatic residues in the sequence, 13.6% vs 11.9%, respectively, that tends to increase if counted only the TMs helices (15.4% vs 12.9%), whereas in GlpT is even higher (15.2% and 18.9% in whole sequence and in TMs only, respectively). However, this model presents some differences either with UhpT and with GlpT. In UhpT has been underlined the importance of Asn388 and Lys391 in substrate recognition [29]. It has been proved that these two residues form a salt bridge that is crucial in selecting the glucose-6-phosphate to the detriment of other organophosphate substrates. However, in the same paper, it is not excluded that glucose-6-phosphate still can be selected and transported by the channel. Moreover, in the organophosphate transporter family, the motif W173NXXHN178 [41] is highly conserved (>95%). This motif is found in UhpT in the Moreover, we found the relatively high percentage of aromatic residues in the sequence, 13.6% vs. 11.9%, respectively, that tends to increase if counted only the TMs helices (15.4% vs. 12.9%), whereas in GlpT is even higher (15.2% and 18.9% in whole sequence and in TMs only, respectively). However, this model presents some differences either with UhpT and with GlpT. In UhpT has been underlined the importance of Asn388 and Lys391 in substrate recognition [29]. It has been proved that these two residues form a salt bridge that is crucial in selecting the glucose-6-phosphate to the detriment of other organophosphate substrates. However, in the same paper, it is not excluded that glucose-6-phosphate still can be selected and transported by the channel. Moreover, in the organophosphate transporter family, the motif W 173 NXXHN 178 [41] is highly conserved (>95%). This motif is found in UhpT in the same position but is not present in our model in any position. Finally, in our model, there is one large undefined loop in the N terminal TM helix (Figure 7). same position but is not present in our model in any position. Finally, in our model, there is one large undefined loop in the N terminal TM helix (Figure 7).

PA Collection and Sequencing Pipeline
21 first acquisition PA strains were selected from 13 different cystic fibrosis patients with initial infection of PA (5 male vs 8 female; age 3-27 years old). These strains were chosen because they may be wild-type concerning genetic background and antimicrobial pressure. Especially for GlpT and UhpT, the main cause for FF resistance is mutations on glpT and uhpT (in E. coli). From the oro-pharyngeal swab, the PA colonies were isolated from Cetrimide agar or McConkey agar after 48-h incubation in controlled temperature (35-37°C).
DNA extraction was performed from pure PA cultures after 24 h of incubation at 37 °C on Columbia agar + 5% sheep blood (bioMérieux) using QIAamp DNA Mini Kit (QIAGEN). Whole-DNA libraries were prepared with Illumina DNA Prep Library Preparation Kit (Illumina). Quality checks were performed with Qubit Fluorometric Quantification (ThermoFisher Scientific). Expecting 100× coverage on average, a pool of six libraries were run with MiSeq Reagent Kit v2, 300-cycles.
First, the quality of the reads was checked with FastQC v.0.11.9 [42]. Trimmomatic v.0.39 [43] was used to remove the reads with command PE -phred33 -threads 4 SLIDINGWINDOW:4:20 MINLEN:70. Reads shorter than 70 bp were removed. Trimmed reads were assembled using SPAdes v.3.15.4 [44] with default command and contigs shorter than 1 kb were removed. Genome assemblies were annotated with prokka v.1.14.6 [45] with default command. Prokka is a tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files. From the output files of prokka, the name_of_sample.faa was selected for further analysis. This file contains all the proteins annotated with the predicted name, if available. Otherwise, the protein is annotated as a hypothetical protein.

Building Hidden Markov Model (HMM)
An HMM is a finite model that describes a probability distribution over a finite number of possible sequences [46] providing a tool for building complex models by drawing an intuitive picture [47]. HMMs have proven its efficacy in predicting protein

PA Collection and Sequencing Pipeline
21 first acquisition PA strains were selected from 13 different cystic fibrosis patients with initial infection of PA (5 male vs. 8 female; age 3-27 years old). These strains were chosen because they may be wild-type concerning genetic background and antimicrobial pressure. Especially for GlpT and UhpT, the main cause for FF resistance is mutations on glpT and uhpT (in E. coli). From the oro-pharyngeal swab, the PA colonies were isolated from Cetrimide agar or McConkey agar after 48-h incubation in controlled temperature (35-37 • C).
DNA extraction was performed from pure PA cultures after 24 h of incubation at 37 • C on Columbia agar + 5% sheep blood (bioMérieux) using QIAamp DNA Mini Kit (QIA-GEN). Whole-DNA libraries were prepared with Illumina DNA Prep Library Preparation Kit (Illumina). Quality checks were performed with Qubit Fluorometric Quantification (ThermoFisher Scientific). Expecting 100× coverage on average, a pool of six libraries were run with MiSeq Reagent Kit v2, 300-cycles.
First, the quality of the reads was checked with FastQC v.0.11.9 [42]. Trimmomatic v.0.39 [43] was used to remove the reads with command PE -phred33 -threads 4 SLIDING-WINDOW:4:20 MINLEN:70. Reads shorter than 70 bp were removed. Trimmed reads were assembled using SPAdes v.3.15.4 [44] with default command and contigs shorter than 1 kb were removed. Genome assemblies were annotated with prokka v.1.14.6 [45] with default command. Prokka is a tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files. From the output files of prokka, the name_of_sampl.faa was selected for further analysis. This file contains all the proteins annotated with the predicted name, if available. Otherwise, the protein is annotated as a hypothetical protein.

Building Hidden Markov Model (HMM)
An HMM is a finite model that describes a probability distribution over a finite number of possible sequences [46] providing a tool for building complex models by drawing an intuitive picture [47]. HMMs have proven its efficacy in predicting protein domain from protein sequence. Generally, the steps to generate a reliable HMM are (i) the retrieval of a training set to build the protein domain model and the training of the model, (ii) the retrieval of the testing set and (iii) the validation of the model.
In order to collect the proteins for building the model, Pfam identifier of MFS (PF07690) was searched in Protein Data Bank (PDB) [48], with some filters: • the proteins have to be an X-ray experiment; • resolution must be ≤3.5 Å; • the proteins have to be wild-type; • no mutations in the sequence.
36 proteins were retrieved. Since most of these proteins contain more than one chain and may be identical or very similar, PDBeFold were run in order to reduce the similarity of the training set. PDBeFold is an on-line tool that allows pairwise or multiple comparison and 3D alignment of protein structures [49]. The chain A of 1PW4 was used as template [28] and the rest of the 36-protein dataset as a query. From the output, 18 proteins remained. To avoid redundancy, skipredundant from EMBOSS was run. This tool automatically clusters proteins based on pairwise alignment by using Needleman-Wunsch global alignment algorithm [50] with 95% of identity. As a result, 9 proteins were retrieved, which list is reported in Table 2. The structural multiple sequence alignment (MSA) of these 6 structures, based on PDBeFold tool, was used as an input to create an HMM model. In the first column, there are the PDB identifiers and the chain, in the second there is the name of the protein and in the third the organism.
The HMM was created by using HMMER v3.3.2 [51]. An HMM was created with command hmmbuild using the MSA file from PDBeFold. In order to validate the model, a positive and a negative set of protein sequences was retrieved from UniProt [52]. As positive set (PS) all the sequences that are (i) manually annotated and reviewed, (ii) 300-500 residues sequence length and (iii) with PFam id PF07690 were filtered. As negative set (NS), the same characteristics were used but filtering out PFam id PF07690 proteins. For both sets, UniRef50 [53] was applied to avoid redundancy. Then, the blastall command of BLAST [54] against the dataset of the sequences of the training set was used to avoid the bias in the testing procedure that could happen if the testing set contains the sequences used to train the model. Finally, the PS and NS contain 341 and 55,258 proteins, respectively. The UniProt identifiers of both sets are in Supplemental Materials Files S6 and S7 as fasta files. The command hmmsearch (with the option -E 0.05) was used on PS and NS to test the model: this command allows the user to search one or more profiles against a sequence database and it gives the score and the E-value for both the whole sequence and the best matching domain. From the results, a statistical evaluation of the HMM was performed (Table 3). In Table 3, the confusion matrix is reported. In the matrix the number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) were reported.
True positive rate (TPR), true negative rate (TNR), accuracy (ACC) and Matthew Correlation Coefficient (MCC) were calculated. The definition of the statistical measures is in Supplemental Materials File S4 and the results are in Table 4.

Conclusions
Certainly, two of the most important MFS proteins are GlpT and UhpT. Whereas the first is ubiquitous and is found in all bacteria, the second was not found in PA. However, they are likewise important for the FF transportation inside the cell and then, crucial in PA acquisition of FF resistance. However, no studies aimed at searching for distant related homologous of UhpT have been carried out. Since the high genetic exchange and high mutational rate in bacteria, it might as well search for an UhpT-like protein in the PA genome. In this study, in addition to GlpT, we found a second GlpT protein in 21 out of 21 PA, called glpt_2, which has characteristics to be an MFS. We had a search in the literature and in the common databases (e.g., UniProt) and we did not find any studies about glpt_2 in PA. However, if we use BLAST to search glpt_2 sequence, we can easily find it, although the information about its specific function is lacking. We searched for glpt_2 sequence in the NCBI set of 433 whole-genome sequences and we found it in all genomes. Also, we found it even in double hits in some genomes. The information are in Supplemental Materials File S2. Therefore, we tried to build a new protein structure by using homology-modeling approach through MODELLER. A UhpT crystal protein was retrieved from AlphaFold databases. The model of UhpT has some characteristics that suggest the function conservation. The binding site, comprehending two arginines (Arg39 and Arg264) and Lys40, is perfectly superimposable to UhpT protein as well as the overall topology. In fact, the 12 TMs are completely comparable, suggesting a well-defined folding of the protein across the bilayer lipid membrane. To enforce our hypothesis, in all 21 PA genomes, we also found a protein annotated as membrane sensor protein UhpC. UhpC is lacking in PA reference genomes, but it is crucial for UhpT expression and function in E. coli, where UhpT is functioning and well characterized. We computed the alignment between the UphC of E. coli and the sequence found in one of our genomes and they share low identity (27.9%). Unfortunately, we were not able to find the other two proteins of the Uhp complex, namely UhpA and UhpB.
The model we have built is an attempt to find a distantly related homologue of a glucose-6-P channel of E. coli. Since the PA strains we used are considered as wild-type, we can assume that most of the PA have proteins like these. Obviously, we cannot say whether this protein is functional or if it is transcript as well. However, having this protein in the PA genome may prove that even PA can use glucose as a carbon source.
The presence of a homologue of UhpT suggests that this protein is conserved in PA genome because it is generally found in all Gram-negative bacteria; however, PA prefers different carbon sources. Moreover, glucose is the less preferred carbon source by PA, preferring other substrates, such as succinate and citrate [55,56]. Therefore, having a wellfunctioning UhpT probably does not affect the fitness of the PA in its habitat. On the other hand, since the mutation rate in bacteria is usually high, we cannot exclude that, in the absence of all favorite carbon source substrates, UhpT-functioning PA can be selected. Institutional Review Board Statement: This study was approved by the local ethics committee (Meyer Children's Hospital, 27/2020) and informed written consent was obtained from parents of involved subject for the use of anonymous clinical data for research purposes.