Genome Sequences of Serratia Strains Revealed Common Genes in Both Serratomolides Gene Clusters

Simple Summary Biosurfactants are amphiphilic molecules produced by microorganisms with a hydrophilic and a hydrophobic group, able to reduce surface tension. These molecules are largely used in the environmental, food, pharmaceutical, medical, and cleaning industries, among others. Serratia strains are ubiquitous microorganisms with the ability to produce biosurfactants, such as serrawettins. These extracellular lipopeptides are described as biocides against many bacteria and fungi. This work used comparative genomics to determine the distribution and organization of the serrawettins W1 and W2 biosynthetic gene clusters in all the 84 publicly available genomes of the Serratia genus. Here, the serrawettin W1 gene clusters’ organization is reported for the first time. The serrawettin W1 biosynthetic gene swrW and serrawettin W2 biosynthetic gene swrA were present in 17 and 11 Serratia genomes, respectively. The same genes in the biosynthetic clusters frame the swrW and swrA biosynthetic genes. This work identified four genes common to all serrawettin gene clusters, highlighting their key potential in the serrawettins biosynthetic process. Abstract Serratia strains are ubiquitous microorganisms with the ability to produce serratomolides, such as serrawettins. These extracellular lipopeptides are described as biocides against many bacteria and fungi and may have a nematicidal activity against phytopathogenic nematodes. Serrawettins W1 and W2 from different strains have different structures that might be correlated with distinct genomic organizations. This work used comparative genomics to determine the distribution and the organization of the serrawettins biosynthetic gene clusters in all the 84 publicly available genomes of the Serratia genus. The serrawettin W1 and W2 gene clusters’ organization was established using antiSMASH software and compared with single and short data previously described for YD25T Serratia. Here, the serrawettin W1 gene clusters’ organization is reported for the first time. The serrawettin W1 biosynthetic gene swrW was present in 17 Serratia genomes. Eighty different coding sequence (CDS) were assigned to the W1 gene cluster, 13 being common to all clusters. The serrawettin W2 swrA gene was present in 11 Serratia genomes. The W2 gene clusters included 68 CDS with 24 present in all the clusters. The genomic analysis showed the swrA gene constitutes five modules, four with three domains and one with four domains, while the swrW gene constitutes one module with four domains. This work identified four genes common to all serrawettin gene clusters, highlighting their essential potential in the serrawettins biosynthetic process.

. Serratia strains with the swrW and swrA genes identified through antiSMASH software. AntiSMASH software was used in 84 Serratia genomes (see Table S1). X represents the presence of genes swrW and swrA in seventeen and eleven genomes, respectively.

Serrawettins Biosynthetic Gene Clusters Analysis
The NRPS gene cluster, where both the serrawettin W1 and W2 biosynthetic genes belong, was analyzed here by using antiSMASH software [50]. The reconstruction of the domains, modules, and structures of the serrawettins was performed by using Phyre2 tridimensional prediction [56], using the serrawettin W1 biosynthetic protein from Serratia sp. AS13, and W2 from Serratia sp. PWN146. PubChem 2D was used to explore the chemical information of the serrawettins using the bacterial models mentioned above.
The concatenated amino acid sequences of each serrawettin biosynthetic gene cluster were organized by protein identification in the same order to allow a cluster alignment with ClustalW PubChem 2D was used to explore the chemical information of the serrawettins using the bacterial models mentioned above.
The concatenated amino acid sequences of each serrawettin biosynthetic gene cluster were organized by protein identification in the same order to allow a cluster alignment with ClustalW in MEGAX software [57]. The serrawettin W1 biosynthetic proteins from 17 Serratia genomes and serrawettin W2 biosynthetic proteins from 11 Serratia genomes were separately aligned. The evolutionary relationship between the clusters and between the serrawettin biosynthetic protein sequences were established using the Neighbor-Joining method, Poisson model [58], in MEGAX software [57]. The aligned and organized cluster genes were represented in an evolutionary tree according to the size, direction, and accession numbers identified with the NCBI BLAST database [51]. Functions of the core proteins in all clusters were searched on UniProt [59].

Bacterial Phylogeny and Comparative Genomics of Serratia spp.
All the 84 Serratia strains selected for this study had their genome publicly available at NCBI. According to the Neighbor-Joining and Maximum-Likelihood phylogenetic trees based on 16S rRNA gene sequences, 14 strains belong to S. marcescens, two strains belong to S. plymuthica, two strains belong to S. grimesii, one strain belongs to S. liquefaciens, one stain belongs to S. nematodiphila, one strain belongs to S. ureilytica, and 63 strains could not be assigned to species level due to a similarity lower than 97% ( Figure 1). Genomes of the Serratia strains have a size from 5.0 Mbp to 7.7 Mbp and the G + C content varies from 45.9 to 60.1 mol%. Figure 1. A phylogenetic dendrogram based on a comparison of the 16S rRNA gene sequence of the Serratia strains used in this study and the type of strains. The tree was created using the Neighbor-Joining method in ARB software. The numbers on the tree indicate the percentages of bootstrap sampling, derived from 1000 replications; values below 50% are not shown. In blue are shown the Serratia strains that showed the presence of the serrawettin W1 biosynthetic gene cluster and in green are shown the Serratia strains that showed the presence of the serrawettin W2 biosynthetic gene cluster. The type species Escherichia coli DSM 30083 T was used as the outgroup. Scale bar, 1 inferred nucleotide substitution per 100 nucleotides. Figure 1. A phylogenetic dendrogram based on a comparison of the 16S rRNA gene sequence of the Serratia strains used in this study and the type of strains. The tree was created using the Neighbor-Joining method in ARB software. The numbers on the tree indicate the percentages of bootstrap sampling, derived from 1000 replications; values below 50% are not shown. In blue are shown the Serratia strains that showed the presence of the serrawettin W1 biosynthetic gene cluster and in green are shown the Serratia strains that showed the presence of the serrawettin W2 biosynthetic gene cluster. The type species Escherichia coli DSM 30083 T was used as the outgroup. Scale bar, 1 inferred nucleotide substitution per 100 nucleotides.

Serrawettin W1 Biosynthetic Gene Clusters
All serrawettin W1 biosynthetic gene clusters were identified as NRPS clusters. Moreover, the bioinformatic analysis predicted an architecture including the domains condensation (C), adenylation (A), thiolation (T), and thioesterase (TE) in all the serrawettin W1 biosynthetic genes ( Figure 2). To confirm the identification of the swrW gene revealed by antiSMASH, each swrW was queried to NCBI BLASTP, in order to find the closest relative and determine the identity percentage (Tables 2 and S2). The protein from the serrawettin W1 biosynthetic gene (swrW) showed an identity percentage that ranges from 77.79% to 100% as the closest identification by using BLASTP (Table 2). To confirm the identification of the swrW gene revealed by antiSMASH, each swrW was queried to NCBI BLASTP, in order to find the closest relative and determine the identity percentage (Table 2 and Table S2). The protein from the serrawettin W1 biosynthetic gene (swrW) showed an identity percentage that ranges from 77.79% to 100% as the closest identification by using BLASTP (Table 2). Eighty different genes from the serrawettin W1 biosynthetic gene cluster were identified. Fifteen genes are common to all 17 gene clusters ( Figure 3 and Table S4), such as genes encoding for murein hydrolase effector protein LrgB and murein hydrolase regulator LrgA, both with hydrolase activity; LysR regulatory protein with DNA-binding transcription factor activity; a sodium-hydrogen antiporter and xanthine-uracil-vitamin C permease, both with transmembrane transport activity; glyoxalase-bleomycin resistance protein and glutathione S-transferase domain protein, both with a dioxygenase activity; 3-oxoacyl-(acyl-carrier-protein) reductase; single-stranded DNA-binding protein; exonuclease ABC subunit A; maltose O-acetyltransferase; and aromatic amino acid aminotransferase. Not considering strain NRBC 102599 T , an additional seven genes were found to be common to all strains in the serrawettin W1 biosynthetic gene cluster ( Figure 3). Moreover, 15 genes are exclusive to the S. plymuthica NBRC 102599 T biosynthetic gene cluster ( Figure 3).  The relationship between strains established based on the analysis of the concatenated genes of the W1 biosynthetic gene cluster defined the same clusters as the relationships defined based on the swrW gene analysis, except for S. plymuthica NBRC 102599 T , which is discordant. The position of S. plymuthica NBRC 102599 T in W1 phylogenetic tree highlights the different gene composition of the W1 biosynthetic cluster of the strain. On the other hand, in the swrW phylogenetic tree, S. plymuthica NBRC 102599 T forms a sister group with Serratia strains AS13, AS9, and AS12 ( Figure 3).

Serrawettin W2 Biosynthetic Gene Clusters
Every serrawettin W2 biosynthetic gene (swrA) showed an architecture composed of five modules, each with a condensation (C1, C2, C3, C4, and C5), adenylation (A1, A2, A3, A4, and A5), and thiolation (T1, T2, T3, T4, and T5) domain. Module 5 has an additional thioesterase (TE) domain. This organization is shared by all the serrawettin W2 biosynthetic genes (Figure 4). To confirm the identification of the swrA gene revealed by antiSMASH, each swrA was, as mentioned above, queried to NCBI BLASTX, in order to find the closest relative and determine the identity percentage (Tables 4 and S3). The protein coded by the biosynthetic genes of serrawettin W2 (swrA) showed an identity percentage that ranges from 76.38% to 99.4% as the closest identification by using BLASTX (Table 4).  To confirm the identification of the swrA gene revealed by antiSMASH, each swrA was, as mentioned above, queried to NCBI BLASTX, in order to find the closest relative and determine the identity percentage (Table 4 and Table S3). The protein coded by the biosynthetic genes of serrawettin W2 (swrA) showed an identity percentage that ranges from 76.38% to 99.4% as the closest identification by using BLASTX (Table 4). Sixty-eight genes were identified in the serrawettin W2 biosynthetic gene clusters ( Figure 5 and Table S5). Twenty-four genes were present in all the strains' gene clusters ( Figure 5), namely, the genes encoding for RNase E specificity factor CsrD, with aminotransferase activity; acrylyl-CoA reductase; sulfoxide reductase subunit YedY and YedZ; 3-dehydroquinate dehydratase, involved on aromatic amino acids biosynthesis; biotin carboxyl carrier protein and carboxylase subunit, both components of the acetyl-CoA carboxylase complex; sodium/pantothenate symporter; DNA-binding proteins; exported proteins involved in cell adhesion; lipid A biosynthesis lauryl acyltransferase, with catalytic activity; glutamine amidotransferase; murein effectors LrgA and LrgB, with hydrolase activity; and LysR transcriptional regulator, with carbonate dehydratase.
The phylogenetic relationship between the strains was established based on the analysis of the concatenated genes of the W2 biosynthetic gene cluster and compared with the relationships defined based on the swrA gene analysis. The same clusters were defined with the swrA phylogenetic tree, showing more homogenous clusters ( Figure 5).
The genes coding for alanine racemase, carboxymuconolactone decarboxylase, MFS transporter, hypothetical protein, tautomerase, hypothetical protein, hypothetical protein, hypothetical protein, LysR family transcriptional regulator, and ssDNA-binding protein ( Figure 5, genes numbered from 60 to 68; Table S5) are exclusive to the Serratia sp. YD25 T cluster. AntiSMASH cluster prediction identified four additional genes involved in PKS-NRPS, encoding for enoylreductase quinone oxidoreductase and ketoreductase 3-oxoacyl-(acyl-carrier-protein) reductase, both present in all the strains; enoylreductase dehydrogenase, present in seven strains; and aromatic amino acid aminotransferase, present in all strains except in Serratia sp. YD25 T (Table 5 and Figure 5).

Discussion
Strains of the genus Serratia are known to colonize a diversity of environments. This ability is usually related to the ability to produce lipopeptides, which includes the serrawettins. The physiological roles of such surface-active exolipids are mostly unknown but seem to contribute specifically to enhancing the flagellum-dependent and flagellum-independent spreading growth of the bacteria on a surface environment. Serrawettins were first reported in pigmented S. marcescens [4] and 53.6% of Serratia strains that showed the swrW or swrA genes clusters in our study belonged to this species. The analysis of the genomes of 84 Serratia strains confirmed that S. marcescens is the only species with strains able to produce serrawettin W1 or W2. From the total genomes analyzed, the swrW biosynthetic gene clusters were detected in 17 Serratia genomes. These strains belonged to different species, namely, eight S. marcescens, four S. plymuthica, two S. grimesii, one S. nematodiphila, and two unidentified Serratia species. Eleven Serratia genomes showed swrA biosynthetic gene clusters. The swrA biosynthetic gene clusters were detected in strains belonging to the species S. marcescens (seven), S. ureilytica (one), and three belong to unidentified Serratia species. These strains come from different sources, such as waste water [43], paper machines [34], infected patients [19,44,49,60], pond water, human tissue [35], rapeseed roots [29][30][31], nematodes, buffer solutions [25], different plants [39,61], and soil [27,28].
AntiSMASH software was able to identify the serrawettins gene clusters as NRPS clusters, as previously described [12,13]. When compared with the known serrawettin W1 proteins, the S. marcescens strains had a protein with an identity percentage higher than 95%, while proteins of the S. plymuthica and S. grimesii strains showed an identity higher than 81% and 78%, respectively. From the serrawettin W1 biosynthesis proteins, 70.2% are conserved residues present in all core biosynthetic proteins, suggesting that the serrawettin W1 protein is well conserved among different strains. When compared with the known serrawettin W2 proteins, S. marcescens and S. ureilytica showed a protein identity higher than 76%, Serratia strains YD25 and SCBI higher than 93%, and subspecies of S. marcescens higher than 99%. In spite of swr2 proteins divergence, 62.3% were conserved residues present in all core serrawettin W2 biosynthetic proteins, suggesting a well-conserved serrawettin W2 protein among the different strains.
As previously described by Li et al. [12], all serrawettin W1 biosynthetic genes have a condensation, adenylation, thiolation, and thioesterase domains, also predicted in our study by antiSMASH software. From the serrawettin W1 biosynthetic gene clusters, a total of 80 different genes were identified, 15 common to all clusters. These common genes encode for proteins that may play an important role in serrawettin W1 biosynthesis. AntiSMASH software predicted four PKS-NRPS genes in the serrawettin W1 cluster, which are multi-enzymatic and multi-domain genes involved in the biosynthesis of nonribosomal peptides: an oxidoreductase present in 7 Serratia clusters (enoylreductase), a 3-oxoacyl-(acyl-carrier-protein) reductase present in all Serratia clusters (ketoreductase), a dehydrogenase present in 14 Serratia clusters (enoylreductase), and an aromatic amino acid aminotransferase present in 15 Serratia clusters (aminotransferase). Usually, PKS genes consist of the acyltransferase (AT), ketosynthase (KS), and a ketoreductase (KR) domains [13]. Proteins of the serrawettin W1 biosynthetic gene cluster tree and SwrW biosynthesis protein tree, when compared, revealed common clades, suggesting that the swrW gene and all cluster organizations have a similar evolutionary history.
Analysis of the swrA biosynthetic gene clusters from 11 Serratia genomes demonstrated complex serrawettin W2 biosynthesis protein domains with five condensation domains, five adenylation domains, five thiolation domains, and one thioesterase domain, as previously described in Su et al. [13]. From the swrA biosynthetic gene clusters, a total of 68 different proteins were identified, 24 common to all clusters, and 8 exclusive to the Serratia sp. YD25 biosynthetic gene cluster. These common proteins may play an important role in serrawettin W2 biosynthesis and may be involved in the different steps needed to produce a nonribosomal peptide as DNA binding, adenylation, condensation, thiolation, and thioesterase. According to Su et al. [13], the three proteins identified as involved in PKS-NRPS hybrid polyketide synthase (acyltransferase, ketosynthase, and ketoreductase) are also involved in the serrawettin W2 biosynthesis process. Our results of the serrawettin W2 biosynthetic gene clusters revealed four genes identified as part of the PKS-NRPS system, such as oxireductase and 3-oxoacyl-(acyl-carrier-protein) reductase, both present in all serrawettin W2 biosynthesis protein clusters; a dehydrogenase present in seven clusters; and an aromatic amino acid aminotransferase present in nine clusters. Proteins of the serrawettin W2 gene clusters tree and SwrA biosynthesis protein tree, when compared, revealed common groups in both trees, suggesting that not only the serrawettin W2 biosynthesis gene but also all cluster organizations have a similar evolutionary history.
None of the genomes analyzed included both the serrawettin biosynthetic gene clusters. Although both present in the Serratia genus, the two biosynthetic gene clusters are distributed in the two clades, as revealed in the 16S rRNA gene-defined phylogenetic tree of the genus Serratia. W1 is present in two clades, in five S. marcescens, one S. nematodiphila, and in all the strains of the genus S. plymuthica analyzed, except one, and in two S. grimesii and one S. rubidaea. On the other hand, W2 is common in all strains of the S. marcescens sub-cluster and three additional strains of the same species, but it was not present in the other Serratia clade.
This work identified four genes common to all serrawettin gene clusters, highlighting their essential potential in the serrawettins biosynthetic process. These genes encoding for CTP synthase, glyoxalase/bleomycin resistance protein/dioxygenase, LrgA family protein, and LrgB family protein are flanking the biosynthesis genes swrW and swrA. In both organizations, the genes encoding for CTP synthase is immediately upstream of the serrawettin biosynthetic gene and the ones encoding for the LrgA and LrgB family proteins are immediately downstream of the genes. Glyoxalase/bleomycin resistance protein/dioxygenase is downstream of swrW and swrA, with a group of non-conserved genes between them. The lrgAB operon in the Staphylococcus codes for a transmembrane protein. The LrgA protein shares many characteristics with bacteriophage antiholins [62]. The antiholin homologue in Bacillus subtilis transports pyruvate and it is regulated in an unconventional way by its substrate molecule [63]. Holins and antiholins control the formation of channels for murein hydrolase to export across the bacterial membrane to time the bacteriophage-induced cell lysis [64]. In Serratia, this operon seems to be associated with the transport of serrawettin, as a facilitator, independently of the coding gene (swrW or swrA), and, therefore, of the complexity of the molecule. The holing-antiholin class of proteins was originally discovered in bacteriophages, where they modulate host cell lysis during lytic infection [65]. A hypothetical model suggests that these proteins could have been acquired by horizontal gene transfer to an ancient bacterium through the integration of these elements into its genome [66]. This suggests the introduction of swrW and swrA as two independent events in the lrgAB operon, but more work is needed to understand the evolution and functional diversification of serrawettin.
Within the W1 biosynthetic gene cluster, the gene coding for a quinone oxidoreductase YhdH/YhfP family is present only in two strains of S. marcescens and in three strains from different species of Serratia. These Serratia strains are missing the gene cluster characterized by the Major Facilitator Superfamily (MFS) gene present in all the other S. marcescens strains. MFS is one of the two largest families of membrane transporters found in bacteria [67]. Phylogenetic analyses revealed the occurrence of 17 distinct families within the MFS, each of which generally transports a single class of compounds. This suggests that within the W1 cluster, the MFS transport system is conserved for most S. marcescens although the other Serratia species presented the quinone oxidoreductase system YhdH.
The W2 biosynthetic gene cluster has a more conserved genetic organization and 24 genes were common to all strains. The gene clusters composed of cystine ABC transporter, substrate-binding protein, and the alanine racemase seem to be involved in the selective transport of amino acids into the cell and in the alanine L to D interconversion. These systems may facilitate the amino acid acquisition by the cell for W2 synthesis. In the W2 biosynthetic gene cluster, both genes are present in several Serratia species but only in four strains of S. marcescens. Both the mechanistic studies, kinetic and energetic, are needed to relate the genes' presence with W2 synthesis in these strains. The other four strains producing the serrawettin W2, which do not present the gene cluster with ABC transporter and racemase, show a gene cluster including a gene belonging to the cyclase family protein. In Serratia, the cyclase family protein shows high homology with diguanylate cyclase, showing a domain from the GGDEF family protein [68]. They are used as an intracellular signaling molecule regulator, involved in bacterial biofilm formation, and persist in several bacteria species.

Conclusions
In conclusion, the present work shows that most species of the genus Serratia that already have their genome sequenced have clusters of serrawettin biosynthetic genes in their genomes. AntiSMASH software was able to identify the serrawettins gene clusters as NRPS clusters. The grouping of biosynthetic gene clusters W1 and W2 are mutually exclusive in the genome. Moreover, the swrW and swrA biosynthetic genes are framed by the same genes in the biosynthetic clusters. CTP is upstream and the operon LgrAB is downstream, suggesting a horizontal gene acquisition of the biosynthetic system by an ancient strain. Within the W1 biosynthetic cluster, the genes coding for the quinone oxidoreductase YhdH/YhfP family, and the one coding for the Major Facilitator Superfamily, are mutually exclusive in the genomes of the strains. The same is found in the W2 biosynthetic gene cluster, in the genes cystine ABC transporter, substrate-binding protein, and the alanine racemase and cyclase family proteins are also mutually exclusive.