Genomic Distribution of ushA-like Genes in Bacteria: Comparison to cpdB-like Genes

UshA and CpdB are nucleotidases of the periplasm of several Gram-negative bacteria, while several Gram-positives contain cell wall-bound variants. UshA is a 5′-nucleotidase, a UDP-sugar hydrolase, and a CDP-alcohol hydrolase. CpdB acts as a 3′-nucleotidase and as a phosphodiesterase of 2′,3′-cyclic nucleotides and 3′,5′-linear and cyclic dinucleotides. Both proteins are pro-virulent for the pathogens producing them and facilitate escape from the innate immunity of the infected host. Recently, the genomic distribution of cpdB-like genes in Bacteria was found to be non-homogeneous among different taxa, and differences occur within single taxa, even at species level. Similitudes and differences between UshA-like and CpdB-like proteins prompted parallel analysis of their genomic distributions in Bacteria. The presence of ushA-like and cpdB-like genes was tested by TBlastN analysis using seven protein probes to query the NCBI Complete Genomes Database. It is concluded that the distribution of ushA-like genes, like that of cpdB-like genes, is non-homogeneous. There is a partial correlation between both gene kinds: in some taxa, both are present or absent, while in others, only one is present. The result is an extensive catalog of the genomic distribution of these genes at different levels, from phylum to species, constituting a starting point for research using other in silico or experimental approaches.


Introduction
The proteins UshA and CpdB are prototypic nucleotidases of the periplasmic space of Escherichia coli [1,2] and other Gram-negative bacteria [3][4][5][6][7][8][9][10]. As far as enzyme activity is concerned, UshA is a highly efficient 5 -nucleotidase that is also active as a phosphoanhydride hydrolase of UDP-sugars, CDP-alcohols, and other nucleotidic derivatives [11][12][13]. CpdB is a highly efficient 3 -nucleotidase, also active as a phosphodiesterase of 2 ,3 -cyclic mononucleotides, 3 ,5 -cyclic or linear dinucleotides, and the artificial phosphodiester substrate bis-4-nitrophenylphosphate [14,15]. Both proteins are structurally related, as following the removable signal peptide for secretion (SP), they display the same twodomain architecture: an N-terminal metallophos domain (Pfam ID PF00149) that includes the catalytic site with a dimetal center, and a C-terminal 5_nucleotid_C domain (Pfam ID PF02872) that includes a substrate-binding site [16,17]. It is noteworthy that the designation of the 5_nucleotid_C domain does not imply the occurrence of 5 -nucleotidase activity. UshA is a 5 -nucleotidase devoid of 3 -nucleotidase activity, while CpdB is a 3 -nucleotidase devoid of 5 -nucleotidase activity. In both proteins, the N-and C-terminal domains are joined by a ≈20-amino acid linker [17,18]. Both enzymes are believed to share a remarkable catalytic cycle in which the typical 5 -AMP or 3 -AMP substrates bind to the specificity site in the 5_nucleotid_C domain, with the adenine ring forming a stacking sandwich between two aromatic residues. The substrate-charged domains then undergo large, 96 • rotations that bring the substrate to the catalytic site in the metallophos domains, where dephosphorylation takes place [17,19,20].
The periplasmic or cell wall locations of these enzymes make them able to act on non-cytoplasmic substrates, either secreted from the same cell or of exogenous origin, for instance, in the cytoplasm of eukaryotic cells invaded by bacterial pathogens such as Salmonella enterica or S. agalactiae. Both 5 -nucleotidases and 3 -nucleotidases have been identified and considered virulence factors for producing pathogens by mechanisms related to their nucleotide-degrading activities or to effects on complement that facilitate evasion from host innate immunity [21][22][23][24][25][26][27]30,31].
For these reasons, we consider it of utmost interest to gain knowledge of how widespread, among the genomes of different bacterial taxa, the occurrence of genes coding for nucleotidases is, which, either by being periplasmic or bound to the cell wall, have the potential to act extracytoplasmatically on nucleotidic substrates. In a recent study, we analyzed the genomic distribution of cpdB-like genes using the protein sequence of S. enterica CpdB as a probe (query) for TBlastN analyses of complete genomes, limited by bacterial taxa at different levels, from phyla to species [32]. The results revealed that cpdB-like genes are far from ubiquitous in the superkingdom Bacteria, being present in some phyla but not in others. At levels higher than species, the genomic distribution was not homogeneous since few taxa contained a cpdB-like gene in all the sequenced genomes. At the level of species, the distribution was more homogeneous, as out of 77 taxa considered, 38 showed a (near) widespread distribution of cpdB-like genes and 28 did not contain them. Interestingly, 11 species showed a partial distribution, with some sequenced genomes but not all containing a cpdB-like gene. This interesting panoramic view prompted us to extend the analysis to ushA-like genes and to perform it in a more detailed way by increasing the number of TBlastN probes from the single one used in the previous study [32] to a total of seven different probes in the current manuscript, five for usha-like genes and two for cpdB-like ones. The result is an extensive catalog of the genomic distribution of these genes at different levels, from phylum to species, constituting a starting point for research using other in silico or experimental approaches. Major observations were that the genomic distribution of ushA-like genes was not homogeneous and that the correlation with cpdB-like genes was partial, as in some taxa both were present or absent; however, in others only one was present. Other interesting outcomes worth further research by other approaches are pointed out.

Materials and Methods
TBlastN analyses [33,34] were run against NCBI complete microbial genomes (https:// blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=TBlastN&PAGE_TYPE=BlastSearch&BLAST_ SPEC=MicrobialGenomes&LINK_LOC=blasttab&LAST_PAGE=blastn, accessed on 15 July 2023). Default parameters were applied except that the maximum number of target sequences was adapted to the expected number of hits. The database was queried using the sequence identifiers of the seven-probe set selected ( Figure 1). Routinely, the Entrez query "NOT plasmid [Title]" was applied. The searches within each taxonomical group (taxid) (Organism) were restricted in principle to genomes of type material [35]. This restriction was removed when less than five type-material genomes were available or, as a rule, for searches within genera and species. The typical conditions for launching a TBlastN search from the Microbial Translated Blast page are shown in Figure S1. When running searches limited by organism, a bug was observed in the organism menu as it occasionally chose the wrong taxid number. Therefore, all taxid numbers were checked in the NCBI Taxonomy browser (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi, acceessed on 15 July 2023) [36]. Genomic hits were computed when the alignment score was >150 and query coverage was >70%.
Genes 2023, 14,1657 3 of 45 S1. When running searches limited by organism, a bug was observed in the organism menu as it occasionally chose the wrong taxid number. Therefore, all taxid numbers were checked in the NCBI Taxonomy browser (https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi, acceessed on 15 July 2023) [36]. Genomic hits were computed when the alignment score was >150 and query coverage was >70%.  Table 1. The grid intersections show the alignment scores obtained. On the top line, the seven proteins selected for use are colored as in Table 1 to facilitate cross-referencing. Within the grid, the colors identify the proteins with high scores, indicative of strong relatedness. The seven probes selected cover, with high scores, the whole set of proteins. Nss, not significan similitude.

Selection of Probes for TBlastN Analysis
The probes for TBlastN analysis of UshA-like genes were selected among a set of 21 bacterial 5′-nucleotidases (Table 1). Eighteen of them were collected from a recent review by Zakataeva [37], to which two well-characterized CpdB-like 3′-nucleotidases were added [14,27] (no. 20 and 21 in Table 1). All of them are either periplasmic or cell wallbound, experimentally studied nucleotidases. In addition, one uncharacterized, putative 5′-nucleotidase of B. subtilis, recovered from UniProtKB/Swiss-Prot (https://www.uniprot.org/help/uniprotkb, acceessed on 15 July 2023) [38], was included (no. 19 in Table 1).  Table 1. The grid intersections show the alignment scores obtained. On the top line, the seven proteins selected for use are colored as in Table 1 to facilitate cross-referencing. Within the grid, the colors identify the proteins with high scores, indicative of strong relatedness. The seven probes selected cover, with high scores, the whole set of proteins. Nss, not significant similitude.

Selection of Probes for TBlastN Analysis
The probes for TBlastN analysis of UshA-like genes were selected among a set of 21 bacterial 5 -nucleotidases (Table 1). Eighteen of them were collected from a recent review by Zakataeva [37], to which two well-characterized CpdB-like 3 -nucleotidases were added [14,27] (no. 20 and 21 in Table 1). All of them are either periplasmic or cell wallbound, experimentally studied nucleotidases. In addition, one uncharacterized, putative 5 -nucleotidase of B. subtilis, recovered from UniProtKB/Swiss-Prot (https://www.uniprot. org/help/uniprotkb, acceessed on 15 July 2023) [38], was included (no. 19 in Table 1). Table 1. Periplasmic or cell wall-bound bacterial nucleotidases are used to select probes for TBlastN analysis of bacterial genomes. This protein set was taken from Zakataeva [37], except no. 19 (taken from UniProtKB/Swiss-Prot [38]) and no. 20 and 21 from [14,27]. The probes selected for the study are shown in the background color as Figure 1 to facilitate cross-referencing.

No
The mutual relatedness among Table 1 proteins was evaluated by the scores of BlastP alignments ( Figure 1). This allowed us to select seven proteins to be used as TBlastN probes; they are identified as proteins 1, 6, 8, 9, 19, 20, and 21 in Table 1 and Figure 1. According to the color code used in Figure 1, five of the selected probes (no. 1, 8, 9, 20, and 21) were highly related to a small group of nucleotidases, whereas the other two probes (no. 6 and 19) showed insignificant alignment scores with any other member of the set. Incidentally, one of the so-called 5 -nucleotidases (no. 14 in Table 1) was actually a CpdB-like protein, to judge from its strong relatedness to authentic CpdB-like enzymes (no. 20 and 21 in Table 1) and insignificant alignment scores to the other Table 1 proteins.

General Strategy for the Analysis and Presentation of Results
TBlastN searches were run between 30 June and 15 July 2023 as described under the Materials and Methods section. The number of hits obtained with each probe for each taxonomical group analyzed was recorded. A global TBlastN search was run on 30 June 2023, in the superkingdom Bacteria (taxid:2), with 4185 type-material genomes and a total of 40,608 genomes available on that date. Table 2 shows the results obtained with the seven probes, computing only hits found among type-material genomes.
Thereafter, searches were run in the Bacteria taxa of the NCBI Taxonomy browser [36] at different levels. Detailed results are shown in Tables S1-S6. Summaries of the results at different levels are shown in Tables 3-8, where the presence or absence of ushA-like and cpdB-like genes is schematically indicated. In these summaries, "presence" does not mean that the genes are widespread in the taxon. In this concern, three levels are distinguished: ≤50%, >50% but <100%, and 100% of the analyzed genomes contain ushA-like and/or cpdB-like genes (hits). This is marked by the letters U (ushA-like) and C (cpdB-like) on different color backgrounds: green (≤50%), orange (>50% but <100%), and red (100%). It must be remarked that these percentages include hits obtained with any of the UshA-like or CpdB-like probes (see Table 2). For ushA-like genes, the results were strongly dependent on the probe (five different ones are used), whereas for cpdB-like genes, the two probes used gave similar results in most but not all the taxonomical groups. Finally, when "presence" of both types of genes is indicated for the same taxon, it does not necessarily mean they are in the same genome, unless 100% of the genomes gave positive results in both cases. Of course, 100% positivity for one gene type and <100% for the other means that some genomes contain both types of genes. On the other hand, logically, "absence" refers to 100% of the analyzed genomes with any probe. The probes are shown in the same background color as Figure 1 and Table 1 to facilitate cross-referencing. Table 3. Presence (+) or absence (−) of ushA-like (U) and cpdB-like (C) genes in Bacteria phyla. The (+) background indicates: green, presence in ≤50% of the genomes analyzed; orange, presence in >50% but <100% of the genomes; red, presence in 100% of the genomes analyzed. Full data can be found in Table S1.      Tables 5 and S3, except for one *. * The class Ardenticatenia does not appear in Table 5 because the single hit obtained in the TBlastN analysis corresponds to a "candidatus" order.     Table 6. Presence (+) or absence (−) of ushA-like (U) and cpdB-like (C) genes in selected bacterial families. The (+) background indicates: green, presence in ≤50% of the genomes analyzed; orange, presence in >50% but <100% of the genomes; red, presence in all the genomes. Full data can be found in Table S4.                       In summary, when a taxonomical level is negative or 100% positive for both genes, or negative for one and 100% positive for the other gene type, the analysis of such a taxon is deemed complete and is not pursued further at lower taxonomical levels.

Genomic Distribution of ushA-like and cpdB-like Genes in Bacteria Phyla
Forty three phyla, including the Delta/epsilon subdivision, found in the NCBI Taxonomy browser within the superkingdom Bacteria (mostly coincident with [39]), were submitted to TBlastN analyses with UshA-like and CpdB-like probes ( Table 2). The detailed results are shown in Table S1. A simpler summary of the presence/absence of ushA-like and cpdB-like genes is shown in Table 3.
Twelve phyla contained both types of genes; 14 showed neither; 11 showed ushAlike but not cpdB-like genes; and in six cases, the converse was true. With the criteria of Section 3.2.1, 19 phyla (those in black type in Table 3) were considered complete and not pursued at lower levels. The 24 phyla in red type are further analyzed in Table 4 and Table S2.

Genomic Distribution of ushA-like and cpdB-like Genes in Bacterial Classes of Selected Phyla
To continue the TBlastN exploration, 76 bacterial classes belonging to 24 different phyla were queried with the seven probes ( Table 2). The detailed results are shown in Table S2. In six of those classes, there was no sequenced genome. A simpler summary of the presence/absence of ushA-like and cpdB-like genes in the 70 classes for which there were sequenced genome(s) is shown in Table 4. With the criteria defined in Section 3.2.1, the analyses of 31 classes (those in black type in Table 4) were considered complete and not pursued further at lower taxonomical levels. On the other hand, the 39 classes shown in red type in Table 4 were further analyzed (Tables 5 and S3), with one exception marked with an asterisk.

Genomic Distribution of ushA-like and cpdB-like Genes in Bacterial Orders of Selected Classes
To continue the TBlastN exploration, 152 bacterial orders belonging to 38 different classes were queried with the seven probes ( Table 2). The detailed results are shown in Table S3. In 20 of those classes, there was no sequenced genome. A simpler summary of the presence/absence of ushA-like and cpdB-like genes in the 132 classes for which there were sequenced genome(s) is shown in Table 5. With the criteria defined in Section 3.2.1, the analyses of 53 orders (those in black type in Table 5) were considered complete and not pursued further at lower taxonomical levels. On the other hand, the 79 orders shown in red type in Table 5 were further analyzed (Tables 6 and S4).

Genomic Distribution of ushA-like and cpdB-like Genes in Bacterial Families of Selected Orders
To continue the TBlastN exploration, 403 bacterial families belonging to 79 different orders were queried with the seven probes ( Table 2). The detailed results are shown in Table S4. In 99 of those families, there was no sequenced genome. A simpler summary of the presence/absence of ushA-like and cpdB-like genes in the 304 families for which there were sequenced genome(s) is shown in Table 6. With the criteria defined in Section 3.2.1, the analyses of 139 families (those in black type in Table 6) were considered complete and not pursued further at lower levels. On the other hand, the 165 families shown in red type in Table 6 were further analyzed (Tables 7 and S5).

Genomic Distribution of ushA-like and cpdB-like Genes in Bacterial Genera of Selected Families
To continue the TBlastN exploration, 510 bacterial genera belonging to 165 different families were queried with the seven probes (Table 2). In contrast to previous steps (Sections 3.2.3-3.2.5), TBlastN analyses were not run for all the genera belonging to the families deemed not complete (those highlighted in red type in Table 6). Instead, while doing the TBlastN analyses of families, the genera giving the hits were annotated, thus avoiding running later lots of TBlastN searches that would not give any hits.
The detailed results obtained with the 510 selected genera are shown in Table S5. For all of them, the NCBI Complete Genomes Database contained at least one sequenced genome. A simpler summary of the presence/absence of ushA-like and cpdB-like genes in those genera is shown in Table 7.
With the criteria defined in Section 3.2.1, the analyses of 268 genera (those in black type in Table 7) were considered complete. Anyhow, for analyses at the level of species (Tables 8 and S6), also at variance with previous steps, the selection was not based on the non-complete character of the genera. Instead, the selection was purely subjective and included species belonging to genera not mentioned in Table 7, as explained in Section 3.2.7.

Genomic Distribution of ushA-like and cpdB-like Genes in Selected Bacterial Species
To continue the TBlastN exploration, 107 bacterial species belonging to different families were queried with the seven probes (Table 2). In contrast to the strategy followed at the previous taxonomical levels, when systematic criteria were applied for taxa selection (Sections 3.2.3-3.2.6), a subjective selection of species was made in this case. It included all the bacterial species analyzed in the earlier study of cpdB-like genes, which had been selected mainly for their pathogenicity [32]. In summary, 107 different species were queried with the seven probes. Of them, the 80 shown in black type were declared complete by the criteria described in Section 3.2.1. Non-complete species are highlighted in red. Detailed results are in Table S6, and a summary is in Table 8.

Overview
This study is a follow-up of a previous analysis of the genomic distribution of cpdB-like genes in Bacteria, which was performed with S. enterica CpdB (GenBank accession P26265) as the probe [32]. That study was mainly centered on the phyla Pseudomonadota and Bacillota (named then more traditionally as Proteobacteria and Firmicutes, respectively) and their lower divisions. In the current manuscript, the former study has been extended in several aspects, mainly that besides cpdB-like genes, ushA-like genes have been analyzed, and the searches were run without a priori restriction to particular taxa. Moreover, several probes were used, two for cpdB-like genes and five for ushA-like genes ( Table 2). The use of several UshA-like probes revealed different types of ushA-like genes, some of them specifically associated with different bacterial taxa. The result is an extensive catalog of the distribution of these genes in superkingdom Bacteria. Several resources are provided, including Supplementary Tables S1-S6 that contain the detailed results of the analyses at different levels: phylum (Table S1), class (Table S2), order (Table S3), family (Table S4), genus (Table S5), and species (Table S6). In the main manuscript, Tables 3-8 contain summaries of the data at different levels, from phylum to species. Table 9 summarizes the total numbers of taxa studied, including the counts of probed taxa of different levels, analyzed taxa (once discounted those for which, by the time of submission, upon TBlastN, no sequenced genomes were found in the NCBI Complete Genomes Database), and taxa declared complete according to the criteria explained in Section 3.2.1. For complete taxa, Table 9 also shows the breakdown by kind of results, depending on whether UshA-like and/or CpdB-like probes gave hits or not. To facilitate searching for particular taxa, alphabetical lists are provided of the 1291 taxa probed (Table S7) and of the 125 taxa without sequenced genomes in the NCBI Complete Genomes Database among those that were probed (Table S8). Table 9. Numbers of taxa probed, analyzed, and deemed complete after TBlastN analyses with UshA-like (U) and CpdB-like (C) probes: breakdown by kind of results obtained, with presence (+) and/or absence (−) of hits with each probe type. The data are computed from the tables indicated.

About the Possible Correlation between ushA-like and cpdB-like Genes
UshA and CpdB have different specificities. UshA is a 5 -nucleotidase, UDP-sugar hydrolase, and CDP-alcohol hydrolase [11,12], and CpdB acts as a 3 -nucleotidase and as a phosphodiesterase of 2 ,3 -cyclic nucleotides and 3 ,5 -linear and cyclic dinucleotides [14]. They are periplasmic [1,2] or cell-wall [21][22][23][24][25][26][27] enzymes that act on extracellular substrates, either exogenous or endogenous. In addition, both are provirulent factors for the producing pathogens, facilitating escape from the innate immunity of the host [21][22][23][24][25][26][27]. The similitude between them was the main reason to study and compare their genomic distributions in Bacteria, with the aim of establishing the extent to which the occurrence of one correlates with the occurrence of the other. In this regard, it is worth recalling that, for instance, the action of CpdB-like proteins on linear and cyclic dinucleotides yields 5 -nucleotides as products but cannot continue their degradation to nucleosides [32]. To this end, the metabolic action of CpdB-like enzymes can be continued by UshA-like enzymes. Moreover, pointing to the correlation between both enzymes is the occurrence in some bacteria of natural fusions of UshA and CpdB as the result of two-gene fusion [40,41].
Tables 10-13 summarize the non-homogeneous distribution of both gene kinds and the (lack of) correlation between them. A qualitative correlation was observed between both gene kinds for some taxa but not for others. In 416 out of 590 taxa (70.5%), they were both either present (31.4%; Table 10) or absent (39.1%; Table 11). However, 174 taxa (29.5%) failed to show such a correlation, as one of the gene types was present but not the other: 21.7% of the taxa bear ushA-like, not cpdB-like genes (Table 12), whereas for 7.8% the converse was true (Table 13).  Table 12. Complete taxa that contain ushA-like but not cpdB-like genes.

Phylum
Genus Genus Table 13. Complete taxa that contain cpdB-like but not ushA-like genes. There was no bacterial class showing these characteristics.

Genus Genus
1 The probes are shown in the same background color as Figure 1 and Table 1 to facilitate cross-referencing. * Numbers in parenthesis were obtained by removing the limit on type material in the TBlastN, which increased the number of genomes queried to 32,175. These four exceptional hits correspond to accession numbers CP080375.1, CP059263.1, CP051512.1, and CP074573.1.
In Table 15, a similar comparison is made between the class Bacilli and the rest of the phylum Bacillota. In every case, the distribution of hits among the sequenced genomes was partial, i.e., there were genomes with ushA-like and cpdB-like genes and genomes without them. The degree of coincidence cannot be easily ascertained at this level. This must be attempted at lower taxonomical levels. The probes are shown in the same background color as Figure 1 and The five species to be discussed in this section have in common that for them there are numerous genomes available in the NCBI Complete Genomes Database and that the TBlastN analyses did not give complete results for any of the two kinds of genes according to the criteria described in Section 3.2.1.
When complete results are obtained at least for one gene kind, e.g., ushA-like genes, the full picture can be inferred: there will be a number of genomes with both kinds present, and the remaining, up to the total number of genomes, will display only the ushA kind, or vice versa. This is the case for several of the species analyzed in Table 8 (with more details in Table S6). For instance, for Staphylococcus saprophyticus, there are 17 genomes available in the database; all of them gave a ushA-like hit, whereas only four gave a cpdB-like one (Table S6, line 92). It can be concluded that four genomes contain both gene kinds, and the remaining 13 contain only an ushA-like one.
In the cases to be discussed below, there were many hits with UshA-like and CpdBlike probes; however, since TBlastN did not give complete results in any case, for some genomes, it was unclear whether both kinds of genes were absent or one was present and the other absent.
The most important part of the structural and functional information of UshA and CpdB nucleotidases and their encoding genes has been obtained in E. coli [1, 2,11,12,14,[16][17][18][19]. For this species, a large number of genomes are available in the Complete Genomes Database (3565 when this manuscript was submitted). Most of them; however, not all contain both ushA-like and cpdB-like genes. According to data in line 35 of Table S6, there are 20 E. coli genomes that do not contain an ushA-like gene and 10 genomes that do not contain a cpdB-like gene. By downloading the TBlastN results obtained for E. coli with probes P07024 (UshA protein) and P08331 (CpdB protein), it was confirmed that the same 3559 E. coli genomes had been hit in both cases. Based on their alignment scores, it was possible to identify four genomes that contain an ushA-like gene but are devoid of a cpdB-like gene and 14 genomes for which the converse is true (Table 16). These exceptions were found in 10 different E. coli strains. In addition, there may be six non-identified genomes that contain neither ushA-like nor cpdB-like genes. These data confirm that, although E. coli is a major contributor to the occurrence of these genes in Bacteria, their distribution is near but not fully homogeneous, and there is a high but not full correlation between them. Table 16. E. coli complete genomes lacking either an ushA-like or a cpdB-like gene. These genomes represent a minor fraction (0.5%) of the total number of E. coli genomes in the NCBI Complete Genomes Database. * This score is the minimum required to compute the hit as an ushA-like gene. It is mentioned here because it is much lower than the immediately higher ushA-like score for this species (625; data for line 66 of Table S6).
K. pneumoniae is another species for which a large number of genomes are available in the Complete Genomes Database (1967, when this manuscript was submitted). Most of them; however, not all, contain both ushA-like and cpdB-like genes. According to data in line 44 of Table S6, five K. pneumoniae genomes do not contain an ushA-like gene, and five do not contain a cpdB-like gene. To find out whether they are the same or not, TBlastN results obtained with probes P07024 (UshA protein) and P08331 (CpdB protein) were downloaded and compared in Excel. This comparison indicated that the 1962 hits found with each probe were the same; therefore, it seems that there are five double-negative K. pneumoniae genomes, i.e., without both ushA-like and cpdB-like genes. So, the distribution of these genes in K. pneumoniae was near but not fully homogeneous, with a full correlation between them.
In the case of V. cholerae, 221 genomes were available in the database, of which 112 gave ushA-like and cpdB-like hits (line 106 of Table S6). By downloading the TBlastN results obtained for this species with probes P07024 (UshA protein) and P08331 (CpdB protein), it was confirmed that the 112 genomes found by the two probes were the same. Therefore, there are no genomes containing only one of the gene types. All the V. cholerae genomes are either double positive or double negative for these genes. Such as in the case of Salmonella Typhimurium (see above), the distribution of the genes is clearly not homogeneous; however, with full correlation between both kinds.

About the Variety of Distributions of ushA-like and cpdB-like Genes in Species of Streptococcus
The genus Streptococcus is interesting because different species showed different typologies concerning the distribution of ushA-like and cpdB-like genes (see Table 8 and details in lines 95-104 of Table S6). This includes: complete double positive (S. sanguinis and S. termophilus, although probes giving complete ushA positives were different); complete double negative (S. mitis and S. pneumoniae); complete positive for ushA and negative for cpdB (S. mutans and S. pyogenes, although the positives were obtained with different probes); complete positive for ushA and partial for cpdB (S. parasuis); partial positive for ushA and complete positive for cpdB (S. agalactiae); near complete but not fully positive for both genes (S. suis); near complete but not fully positive for ushA and partial for cpdB (S. dysgalactiae). Within Streptococcus species, there is both an irregular distribution and an irregular correlation between ushA-like and cpdB-like genes.

Repercussion of the Results
In the earlier study of the genomic distribution of cpdB-like genes [32], the possible repercussions of the different kinds of distribution found (widespread, partial, negative) were analyzed, taking into account the role of CpdB-like proteins in the virulence of pathogens, a feature that is shown both by CpdB-like and UshA-like proteins [21][22][23][24][25][26][27]. Therefore, the same analysis can be applied to the results of the current manuscript. This is summarized in three conclusions (adapted from [32]).
Species that do not contain ushA-like and/or cpdB-like genes cannot explode the UshAlike or CpdB-like protein-dependent strategies that facilitate innate immunity escape.
Species in which ushA-like and/or cpdB-like genes are widespread constitute a field to explore the possible role of genes in virulence by creating gene mutants and studying the enzyme activity and specificity of the proteins.
In species with a partial distribution of ushA-like and/or cpdB-like genes, their presence or absence could modulate the virulence of pathogen strains or isolates.

Strength and Limitations of this Study
The major strength of this study is that it constitutes an extensive catalog of the genomic distribution in Bacteria of two genes with related enzymatic function (but different specificities), structure, and role in virulence. Moreover, interesting is that the TBlastN results are analyzed in terms of alignment scores, which are a constant independent of database size.
On the other hand, the following limitations should be considered: First, the classification of bacterial taxa is eventually subject to alterations, and, in fact, it has been so since the publication of our earlier study [32].
Second, the results obtained for each taxon are not necessarily stable over time. New bacterial genomes are being sequenced and added to the NCBI Complete Genomes Database or, eventually, retired. In some cases, this was observed to occur to a minor extent in the course of data collection. This can affect the results in a significant way for those taxa with few genomes deposited in the database.
Third, to interpret the results of TBlastN searches in terms of the presence of ushA-like and cpdB-like genes, a minimum alignment score of 151 and a minimum query coverage of 71% were established. This reduces the number of false positives but, in turn, can disregard true but distant homologs. This may have occurred, for instance, with the results of probe WP_011837008, as with some frequency it gave significant scores but with coverages somewhat below 71%.
Fourth, TBlastN hits, even with high scores, reveal the presence of the corresponding genes but do not warrant that they are expressed or that the proteins encoded are enzymatically active. In fact, for instance, silent alleles of ushA have been reported in S. enterica and E. coli [42][43][44][45].
Finally, in such a large collection of data, mistakes are expected. Therefore, in the case of special interest in any concrete result, the readers should be wise to check it by running themselves the relevant TBlastN searches. an uncharacterized B. subtilis metallophosphoesterase named YUND_BACSU. This protein is widespread in B. subtilis genomes (331/346) and has good homologs in 100% of the genomes of B. anthracis and B. cereus. These uncharacterized proteins are therefore interesting candidates for cloning, expression, and enzyme characterization. Eventually, they could also be tested for effects on virulence. 5.
The five complete genomes available for avian pathogenic E. coli (APEC ; Table 17) contain high-score hits of ushA (probe accession number P070724) and cpdB genes (probe accession number P08331). The cpdB gene of APEC has been shown previously to be provirulent [30], while the effect of the ushA gene has not been investigated. Our data indicate that this is a possibility worth investigating by creating the ushA mutant and the double ushA and cpdB mutants of APEC. 6.
The five complete genomes available for Salmonella Pullorum contain high-score hits for ushA (probe accession number P070724) and cpdB genes (probe accession number P08331) (Table S6, line 78). For this Salmonella serovar, the cpdB gene has been shown previously to be provirulent [31], while the effect of the ushA gene has not been investigated. Our data indicate that this is an interesting possibility to explore by creating the single ushA mutant and the double cpdB and ushA mutants of S. Pullorum. 7.
The different species of the genus Streptococcus offer a variety of situations concerning the genomic distribution of ushA-like and cpdB-like genes (see Section 4.3.3). On the other hand, ushA-like genes of S. sanguinis, S. agalactiae, S. pyogenes, and S. suis, and cpdB-like genes of S. agalactiae and S. suis, individually considered, have been shown to be pro-virulent for the producing pathogens. However, the following cases remain to be studied: (i) the combined effect of ushA-like and cpdB-like genes on the virulence of S. agalactiae and S. suis; (ii) the possible effect of cpdB-like genes in the virulence of all the Streptococcus species that contain such genes but so far have not been studied in this concern; (iii) the possible effect of ushA-like genes in the virulence of all the Streptococcus species that contain such genes but so far have not been studied in this concern.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/genes14081657/s1; Figure S1: Example of a launch page for a TBlastN search; Table S1: TBlastN analysis of ushA-like and cpdB-like genes in Bacteria phyla; Table S2: TBlastN analysis of ushA-like and cpdB-like genes in Bacteria classes from selected phyla; Table S3: TBlastN analysis of ushA-like and cpdB-like genes in Bacteria orders from selected classes; Table S4: TBlastN analysis of ushA-like and cpdB-like genes in Bacteria families from selected orders; Table S5: TBlastN analysis of ushA-like and cpdB-like genes in Bacteria genera from selected families; Table S6: TBlastN analysis of ushA-like and cpdB-like genes in selected Bacteria species; Table S7: Alphabetical lists of Bacteria taxa analyzed; Table S8: Alphabetical lists of taxa probed that contain no sequenced genomes in the NCBI Complete Genomes Database. Funding: This research received no external funding. However, we would like to acknowledge recurrent funding to the Grupo de Enzimología from the Consejería de Economía, Ciencia y Agenda Digital, Junta de Extremadura, Spain (grant number GR21100) co-funded by FEDER (European Regional Development Fund). The APC was funded by a waiver benefit granted by MDPI.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data are contained within the article or supplementary material.

Conflicts of Interest:
The authors declare no conflict of interest.