Characterization of Sinorhizobium sp. LM21 Prophages and Virus-Encoded DNA Methyltransferases in the Light of Comparative Genomic Analyses of the Sinorhizobial Virome

The genus Sinorhizobium/Ensifer mostly groups nitrogen-fixing bacteria that create root or stem nodules on leguminous plants and transform atmospheric nitrogen into ammonia, which improves the productivity of the plants. Although these biotechnologically-important bacteria are commonly found in various soil environments, little is known about their phages. In this study, the genome of Sinorhizobium sp. LM21 isolated from a heavy-metal-contaminated copper mine in Poland was investigated for the presence of prophages and DNA methyltransferase-encoding genes. In addition to the previously identified temperate phage, ΦLM21, and the phage-plasmid, pLM21S1, the analysis revealed the presence of three prophage regions. Moreover, four novel phage-encoded DNA methyltransferase (MTase) genes were identified and the enzymes were characterized. It was shown that two of the identified viral MTases methylated the same target sequence (GANTC) as cell cycle-regulated methyltransferase (CcrM) of the bacterial host strain, LM21. This discovery was recognized as an example of the evolutionary convergence between enzymes of sinorhizobial viruses and their host, which may play an important role in virus cycle. In the last part of the study, thorough comparative analyses of 31 sinorhizobial (pro)phages (including active sinorhizobial phages and novel putative prophages retrieved and manually re-annotated from Sinorhizobium spp. genomes) were performed. The networking analysis revealed the presence of highly conserved proteins (e.g., holins and endolysins) and a high diversity of viral integrases. The analysis also revealed a large number of viral DNA MTases, whose genes were frequently located within the predicted replication modules of analyzed prophages, which may suggest their important regulatory role. Summarizing, complex analysis of the phage protein similarity network enabled a new insight into overall sinorhizobial virome diversity.


DNA Sequencing
Genomic DNA of the LM21 strain was isolated using the CTAB/Lysozyme method [20]. An Illumina TruSeq library was constructed following manufacturer's instructions and sequenced applying Illumina MiSeq instrument (using the v3 chemistry kit) (Illumina, San Diego, CA, USA). Raw reads were filtered for quality and assembled using Newbler version 3.0 software (Roche, Basel, Switzerland).

Bioinformatics
The LM21 draft genome was automatically annotated using RAST server [21,22]. The prophage sequences within the draft genome were identified using PhiSpy [23] and manual inspection. Then, the predicted prophage sequences were manually annotated using Clone Manager (Sci-Ed8) and Artemis software [24]. Similarity searches were performed using the BLAST program [25] provided by the NCBI, UniProt [26], and Pfam databases [27]. Putative tRNA genes were searched using the tRNAScan-SE [28] and ARAGORN programs [29]. Protein conserved domains and motifs were searched using MOTIF Search [30] and HHpred tools [31]. The MTase genes were tracked using the BLAST search with the REBASE [32] resources as a query, and the obtained results were manually verified. Phage taxonomy assignment was performed using VIRFAM [33] and BLAST searches of large terminase subunit and major capsid protein sequences of sinorhizobial phages against Caudovirales phages indicated in ICTV Master Species List 2016 v1. 3 [34] (ictvonline.org). The visualization of the comparative genomic analyses results was performed with the application of Circoletto [35] and Gephi [36]. Similarity network was constructed based on all against all BLAST results with the application of our custom Python script. In the network each node represents a single protein and each edge reflects reciprocated sequence similarity between two proteins (above given thresholds).

Standard Molecular Biology Procedures
Standard DNA manipulations were carried out according to the protocols described by Sambrook and Russell [20]. PCR reactions were performed with Phusion High Fidelity DNA polymerase (Thermo Fisher Scientific, Waltham, MA, USA).

Cloning, Overexpression, Purification, and Testing of Putative DNA MTases Activities
The predicted DNA MTase genes identified within prophages Φ2LM21 and Φ3LM21 were amplified using specific oligonucleotide primers (Table S1). Then, the PCR products (after purification) were digested with appropriate enzymes and ligated with pET30a or pET28a vector cut with the same enzymes as the DNA of a relevant insert. Restriction enzymes used for cloning, vectors, and names of resulting recombinant plasmids are listed in Table S1. The recombinant enzymes were expressed in the E. coli ER2566. Protein expression and restriction enzyme digestion protection assay for revealing the sequence specificity of particular MTases was performed as previously described [37].

Cloning, Overexpression and Testing of Φ2LM21-Encoded Lytic Enzyme Activity
The DNA encoding putative lytic enzyme (Phi2LM21_p54) of the Φ2LM21 prophage was amplified by PCR using primers listed in Table S1. DNA product was cleaved with NdeI and XhoI and cloned into appropriate sites of digested pET30a plasmid, yielding pET-lyt. Plasmid pET-lyt was introduced into E. coli ER2566 and the resulting strain was inoculated and cultured in LB medium supplemented with glucose (final concentration of 1.0%) to an optical density (OD 600 ) of 0.35. Then, the culture was centrifuged, resuspended in fresh LB medium and divided into two equal volumes-one supplemented with glucose and the other with IPTG (Isopropyl β-D-1-thiogalactopyranoside) to a final concentration of 1 mM. Growth of these two cultures was monitored by measuring the optical density.

Nucleotide Sequence Accession Number
The whole-genome shotgun project of Sinorhizobium sp. LM21 has been deposited in the NCBI GenBank database under the accession number SAMN06765771.

Identification and Classification of the Sinorhizobium sp. LM21 Prophages
Only two Sinorhizobium sp. LM21 prophages have been identified thus far. These were plasmid-like prophage, pLM21S1, and temperate phage, ΦLM21 ( Figure 1) [14,16]. The pLM21S1 (117.5 kb) is an unusual extrachromosomal element that carries a RepC-like replication system (typical for repABC-type plasmids of Alphaproteobacteria [38] and is related to phage RHEph10 (GenBank accession No. JX483881) of Rhizobium etli CFN42 [39]. It also carries genes encoding enzymes involved in nicotinamide adenine dinucleotide (NAD) biosynthesis [16]. The second identified virus was a temperate phage, ΦLM21 (50.8 kb) [14]. The ΦLM21 phage was identified as an active virus after the treatment of Sinorhizobium sp. LM21 cells with mitomycin C. It was the only phage that was induced with this method in Sinorhizobium sp. LM21, which may suggest that pLM21S1 and other putative prophages within the LM21 genome are inactive, or alternatively, they may require specific, as yet unidentified environmental factors for induction.
In the course of this study, the draft genome sequence of Sinorhizobium sp. LM21 was obtained. It was assembled into 136 contigs (the size ranged from 103 to 1,033,074 bp) with a total length of 7,615,909 bp and 62.26% GC content. Automatic annotation performed with the application of the RAST server revealed the presence of 7627 genes (including 55 tRNA genes). The total length of predicted genes was 6,673,962 bp, which comprises 87.9% of the genome.
Obtaining the LM21 draft genomic sequence enabled us to perform searches of other prophages. With the use of the PhiSpy tool and manual inspection, besides the abovementioned ΦLM21 and pLM21S1, we distinguished three novel prophage regions, which were named Φ2LM21, Φ3LM21, and Φ4LM21, respectively (Files S1-S3). Based on the predicted proteomes of the distinguished prophages, the VIRFAM tool [33] classified Φ2LM21 and Φ3LM21 into the Siphoviridae, and Φ4LM21 into the Myoviridae family.

Characterization of the Φ2LM21 and Φ3LM21 Prophages
Two of the DNA regions distinguished within the LM21 genome and containing clusters of viral genes most probably comprise complete prophages. Their predicted genome sizes are 46,599 bp for Φ2LM21 and 41,447 bp for Φ3LM21, and the GC content (60.65% and 61.24%, respectively) is slightly lower than the GC content of the LM21 genome (62.26%). For the Φ2LM21 and Φ3LM21 prophages, the manual annotation revealed the presence of 69 and 59 genes, respectively. The specific functions were assigned for 27 and 28 of those genes, respectively (Tables S2 and S3). The gene content and structural organization of functional Φ2LM21 and Φ3LM21 modules were similar

Characterization of the Φ2LM21 and Φ3LM21 Prophages
Two of the DNA regions distinguished within the LM21 genome and containing clusters of viral genes most probably comprise complete prophages. Their predicted genome sizes are 46,599 bp for Φ2LM21 and 41,447 bp for Φ3LM21, and the GC content (60.65% and 61.24%, respectively) is slightly lower than the GC content of the LM21 genome (62.26%). For the Φ2LM21 and Φ3LM21 prophages, the manual annotation revealed the presence of 69 and 59 genes, respectively. The specific functions were assigned for 27 and 28 of those genes, respectively (Tables S2 and S3). The gene content and structural organization of functional Φ2LM21 and Φ3LM21 modules were similar to the previously described active phage ΦLM21, which was also classified into the Siphoviridae family.
The integration/excision modules of the Φ2LM21 and Φ3LM21 phages contain the tyrosine integrase genes (phi2LM21_p01 and phi3LM21_p01, respectively). The Φ2LM21-encoded integrase (360 amino acids (aa)) exhibited the highest identity (98%), with the integrases widely distributed in many Sinorhizobium/Ensifer genomes (e.g., GenBank accession Nos. OCP05027 and OCP11814). The Φ3LM21-encoded integrase (371 aa) showed the highest identity (83%) with site-specific integrase/recombinase of Rhizobium phage vB_RleM PPF1 (GenBank accession No. YP_009099606). It is noteworthy that Phi2LM21_p01 and Phi3LM21_p01 proteins and also an integrase of the ΦLM21 phage (GenBank accession No. AII27753) do not show significant sequence similarity. It was also possible to distinguish the potential attachment sites (attB) for both prophages. Φ2LM21 and Φ3LM21 integrated into phenylalanine tRNA (tRNA-Phe (GAA)) and proline tRNA (tRNA-Pro (CGG)) genes, respectively. Downstream of the predicted prophage regions, sequences identical to the first 17 and 57 nucleotides of the Φ2LM21 and Φ3LM21 genomes, respectively, could be identified. It is noteworthy that the previously identified phage, ΦLM21, was integrated into another proline tRNA (tRNA-Pro (GGG)) gene, and, in all abovementioned cases, integration reconstituted an intact copy of the appropriate genes.
The lysogeny control region in Φ2LM21 and Φ3LM21 is composed of two genes, i.e., phi2LM21_p16 and phi2LM21_p17 in Φ2LM21, and phi3LM21_p17 and phi3LM21_p18 in Φ3LM21. The first gene in each pair encodes a CI repressor-like protein (as the HTH_CROC1 (Cro/C1-type helix-turn-helix) conserved domain was identified in it), which is leftward orientated. The subsequent gene in each prophage most probably encodes Cro-like protein (as predicted using MOTIF Search and HHpred tools), which is in rightward orientation ( Figure 1).
We predicted that phi2LM21_p26 and phi3LM21_p26 encode putative replication initiation proteins of prophages Φ2LM21 and Φ3LM21, respectively. Both proteins are homologous (28% of reciprocal identity) and contain a helix-turn-helix domain, which is most probably responsible for their interactions with DNA. Interestingly, within the replication modules of both prophages, the DNA MTase genes were also identified including: (i) phi2LM21_p23 (in Φ2LM21), encoding C5 cytosine-specific DNA methyltransferase (m 5 C MTase); and (ii) phi2LM21_p21 (in Φ2LM21) and phi3LM21_p21 (in Φ3LM21), encoding m 6 A MTases. We speculate that those enzymes may participate in regulation of phage replication. It is worth mentioning that, at the left arm of the prophage Φ2LM21, another m 6 A MTase gene (phi2LM21_p66) was found, which means that Φ2LM21 encodes, in total, three DNA MTases ( Figure 1).
In both prophages, the host's cell lysis genes are located downstream of the structural gene clusters. In Φ2LM21, those are phi2LM21_p54 and phi2LM21_p56, encoding putative chitinase (COG3179) exhibiting 92% identity to chitinase of Ensifer adhaerens X097 (GenBank accession No. OKP79630), and holin belonging to holin superfamily III [41] with 98% identity to LydA phage holin of Ensifer sp. Root1298 (GenBank accession No. KQX55447) [42], respectively. To verify the function of Phi2LM21_p54 as a predicted lytic enzyme, we cloned its gene into the plasmid vector pET30a under the control of an inducible T7 promotor. It was shown that the induction of the phi2LM21_p54 gene by IPTG had a lethal effect on the heterological host, resulting in cell lysis after 45 min ( Figure S1). In Φ3LM21, only one gene (phi3LM21_p51) encoding lytic enzyme (putative chitinase (COG3179)) exhibiting the highest identity (83%) to several predicted chitinases of Sinorhizobium meliloti (GenBank accession Nos. WP_017267359, WP_027989971, andWP_028011802) was identified ( Figure 1). This putative lytic enzyme is 82% identical with PhiLM21_p65 of ΦLM21, whose lytic activity was previously demonstrated experimentally [14].
In Φ2LM21 and Φ3LM21, the homologous genes (phi2LM21_p65 and phi3LM21B_p57) encoding ATP-dependent DNA ligases were also identified. Related DNA ligases are encoded within several other phages, including: ΦLM21 (GenBank accession No. AII27824), Rhizobium phage vB_RleM_PPF1 (GenBank accession No. AID18355), and Burkholderia phage Bcepil02 (GenBank accession No. ACR15036), which may suggest their role in phage functioning, e.g., in recombination or integration of the virus, however this needs further analyses.
Moreover, in Φ2LM21 and Φ3LM21, besides genes encoding "typical" phage proteins, several additional modules (most probably comprising auxiliary metabolism genes) were identified. The regions carrying the extra genes are clustered within the right arm of the predicted prophage, downstream of the putative chitinase genes, which may indicate that they were hitchhiked from the bacterial hosts.
In Φ2LM21 the following functions for the "extra" genes were predicted: (i) phi2LM21_p58 encodes a putative SOS response-associated peptidase (SRAP) of the SRAP family, which may act as a DNA-associated autoproteolytic switch that recruits diverse repair enzymes onto DNA damage [43]; and (ii) phi2LM21_p68 encodes a putative nucleoid-associated NdpA-like protein exhibiting 52% identity to the appropriate protein of Methylobacterium sp. UNCCL125 (GenBank accession No. SFV08872). Moreover, within the right arm of the Φ2LM21 prophage, the abovementioned m 6 A MTase (Phi2LM21_p66) is also encoded.
In the Φ3LM21 prophage, the putative functions for four of the auxiliary metabolism genes were predicted. The phi3LM21_p49 gene encodes an FkbM-like methyltransferase. It was shown previously that the homologs of this enzyme are required for specific methylation in the biosynthesis pathway of the macrocyclic polyketides (FK506 and FK520), with immunosuppressive activities in Streptomyces sp. strain MA6548 [44]. Interestingly, ΦLM21 also encodes a FkbM-like methyltransferase (GenBank accession No. AII27814), but both proteins seem to be unrelated. The phi3LM21_p52encoded protein exhibited 72% identity to putative ammonia monooxygenase of Rhizobiales bacterium 68-8 (GenBank accession No. OJU35087). The ammonia monooxygenase is a metalloenzyme that catalyzes the oxidation of ammonia to hydroxylamine, which is the first step of nitrification of ammonia to nitrate [45]. Interestingly, the homologs of the Phi3LM21_p52 protein were found also in several phages, i.e., Erwinia phages, vB_EamM_Huxley and vB_EamM_ChrisDB, and Ralstonia phage, RSL2 (GenBank accession Nos. YP_009293074, YP_009292796, and YP_009213016). It was also revealed that the gene, phi3LM21_p60, encodes a putative ribose-phosphate pyrophosphokinase (Prs), showing the highest identity (99%) to related proteins in Ensifer/Sinorhizobium spp. (e.g., GenBank accession Nos. KDP75975, KQX04241, and KQZ45803). This enzyme transfers a pyrophosphoryl group from ATP to ribose 5-phosphate, synthesizing 5-phospho-α-D-ribose 1-diphosphate (PRPP). This reaction is needed during the synthesis of purines and pyrimidines, histidine and tryptophan amino acids, and NAD and NADP cofactors, and links these biosynthetic processes to the pentose phosphate pathway [46]. The last "additional" gene (phi3LM21_p62) of the Φ3LM21 prophage encodes a predicted lipopolysaccharide biosynthesis glycosyltransferase that may be involved in the addition of galactose or glucose residues to lipooligosaccharide (LOS) or lipopolysaccharide (LPS) of the bacterial cell surface [47]. Interestingly, genes encoding enzymes involved in LPS modification have been also identified in other temperate phages, e.g., phage ε15 conducting lysogenic conversion of Salmonella enterica, effecting in production of an altered form of LPS [48,49].

Functional Analyses of DNA Methyltransferases Encoded by the Sinorhizobium sp. LM21 Prophages
As mentioned above, two genes, phi2LM21_p21 and phi3LM21_p21, were predicted to encode m 6 A MTases. Protein products of these genes show 51% reciprocal identity, and, additionally, Phi2LM21_p21 and PhiLM21_p027 of ΦLM21 (GenBank accession No. AII27779) exhibit 33% identity. We demonstrated previously that the specificity of PhiLM21_p027 is GANTC (methylated nucleotide is underlined), the same as the host-encoded CcrM, a regulatory enzyme widespread among members of Alphaproteobacteria, although PhiLM21_p027 and CcrM LM21 do not share sequence similarities [14]. To determine whether GANTC sequences are substrates for Phi2LM21_p21 and Phi3LM21_p21, we digested the pET_Phi2LM21_p21 and pET_Phi3LM21_p21 plasmid DNAs isolated from IPTG-induced and non-induced E. coli cultures with Hinf I restriction enzyme (specificity GANTC, inhibited by m 6 A methylation). To confirm the susceptibility of the substrate DNA to digestion, a number of adenine methylation-sensitive and -insensitive endonucleases in an REase digestion assay were used. The DNAs isolated from the induced cultures were either fully (pET_Phi3LM21_p21) or partially (pET_Phi2LM21_p21) resistant to cleavage by Hinf I. All other REases were able to cleave substrate DNAs. Similarly, the pET_Phi2LM21_p21 and pET_Phi3LM21_p21 DNAs isolated from the non-induced cultures were susceptible to all restriction enzymes used, including Hinf I ( Figure 2).
Additionally, in order to determine whether GATC sequences may also represent a substrate for Phi2LM21_p21 and Phi3LM21_p21 MTases, we re-transformed pET_Phi2LM21_p21 and pET_Phi3LM21_p21 plasmid DNAs to E. coli ER2929 Dam − strain lysogenized with DE3 element (in DNAs isolated from the E. coli ER2566 all GATC sites are m 6 A modified due to the host EcoKDam MTase activity) [19]. The plasmid DNAs isolated from the induced cultures of E. coli ER2929 Dam − strain were cleaved by MboI (GATC, inhibited by m 6 A methylation), while partial cleavage, with a large proportion of DNA fragments corresponding to linearized plasmids, was observed after using Hinf I (Figure 2). Based on all obtained results, we concluded that the sequence specificity of both Phi2LM21_p21 and Phi3LM21_p21 was GANTC. Thereby, these enzymes were recognized as alphaproteobacterial phage MTases mimicking the sequence specificity of the host CcrM regulatory enzyme.
Previously, three GANTC-specific m 6 A MTases (JCM7686_1231, JCM7686_2255, and JCM7686_2934) were identified in prophage regions of the Paracoccus aminophilus JCM 7686 genome (one of these prophages, ФPam-6, turned out to be active) [5]. These small 179-amino acid proteins share putative catalytic (NPPW/F/Y) and S-adenosyl methionine (SAM)-binding motifs with the later-discovered 218-aa PhiLM21_p027 enzyme of ΦLM21 [14]. Interestingly, Phi2LM21_p21 and Phi3LM21_p21 proteins, both analyzed in this work, are much larger-599 and 456 aa, respectively. It should be stressed that genes encoding all the above mentioned GANTC-specific MTases are localized upstream of a cluster of genes presumably involved in phage replication. The same specificity of these MTase enzymes and the same localization of their genes within the phage genome strongly suggest relevance of methyltransferase activity for the phage replication. Noteworthy, we identified numerous homologs of these phage MTases with CcrM-like specificity in genomes of active, virulent (e.g., Sinorhizobium phage phiN3 (GenBank accession No. YP_009212452)) and temperate (e.g., Rhizobium phage vB_RleM_PPF1 (GenBank accession No. YP_009099644 of)) Alphaproteobacteria phages and even more of them within putative Alphaproteobacteria prophage sequences, which may suggest that this phenomenon is common in Alphaproteobacteria phages (Figure 3). Based on all obtained results, we concluded that the sequence specificity of both Phi2LM21_p21 and Phi3LM21_p21 was GANTC. Thereby, these enzymes were recognized as alphaproteobacterial phage MTases mimicking the sequence specificity of the host CcrM regulatory enzyme.
Previously, three GANTC-specific m 6 A MTases (JCM7686_1231, JCM7686_2255, and JCM7686_2934) were identified in prophage regions of the Paracoccus aminophilus JCM 7686 genome (one of these prophages, ΦPam-6, turned out to be active) [5]. These small 179-amino acid proteins share putative catalytic (NPPW/F/Y) and S-adenosyl methionine (SAM)-binding motifs with the later-discovered 218-aa PhiLM21_p027 enzyme of ΦLM21 [14]. Interestingly, Phi2LM21_p21 and Phi3LM21_p21 proteins, both analyzed in this work, are much larger-599 and 456 aa, respectively. It should be stressed that genes encoding all the above mentioned GANTC-specific MTases are localized upstream of a cluster of genes presumably involved in phage replication. The same specificity of these MTase enzymes and the same localization of their genes within the phage genome strongly suggest relevance of methyltransferase activity for the phage replication. Noteworthy, we identified numerous homologs of these phage MTases with CcrM-like specificity in genomes of active, virulent (e.g., Sinorhizobium phage phiN3 (GenBank accession No. YP_009212452)) and temperate (e.g., Rhizobium phage vB_RleM_PPF1 (GenBank accession No. YP_009099644 of)) Alphaproteobacteria phages and even more of them within putative Alphaproteobacteria prophage sequences, which may suggest that this phenomenon is common in Alphaproteobacteria phages (Figure 3). ; and YP_009146999 (of Aurantimonas phage AmM-1). Additionally, within the alignment, MTases analyzed in this work (i.e., Phi2LM21_p21 and Phi3LM21_p21) were also included. The conserved amino acids were distinguished and/or presented within the consensus sequence. Moreover, catalytic motif (also known as motif IV of MTases) composed of NPP(Y/W/F) residues was indicated by a green block above the alignment. To retain transparency, the alignment was trimmed on both sides, and only its central, conserved region was presented. The numbers of trimmed amino acids have been provided in parentheses. (of Rhizobium grahamii); and YP_009146999 (of Aurantimonas phage AmM-1). Additionally, within the alignment, MTases analyzed in this work (i.e., Phi2LM21_p21 and Phi3LM21_p21) were also included. The conserved amino acids were distinguished and/or presented within the consensus sequence. Moreover, catalytic motif (also known as motif IV of MTases) composed of NPP(Y/W/F) residues was indicated by a green block above the alignment. To retain transparency, the alignment was trimmed on both sides, and only its central, conserved region was presented. The numbers of trimmed amino acids have been provided in parentheses.
As these phage GANTC-specific MTases and CcrM proteins of their host are unrelated [14], it is therefore a clear example of evolutionary convergence of the sequence specificity of bacterial and phage CcrM-like enzymes in Alphaproteobacteria, similar to convergence of the GATC sequence specificity of bacterial and the majority of phage Dam-like proteins of Gammaproteobacteria [50].
Restriction enzyme digestion protection assay with panels of cytosine methylation-sensitive and adenine methylation-sensitive endonucleases were also used to test sequence specificity of Phi2LM21_p23, a putative m 5 C MTase, and Phi2LM21_p66, a putative m 6 A MTase. The DNAs of pET_Phi2LM21_p23 and pET_Phi2LM21_p66 isolated from the induced E. coli ER2566 cultures were sensitive to all restriction enzymes used in this test (data not shown), which suggests that the two remaining MTases of Φ2LM21 are inactive in a heterological host. In the case of Phi2LM21_p23, we can presume its specificity based on the similarity of this protein to JCM7686_0772 and JCM7686_2655-m 5 C MTases of P. aminophilus JCM7686 (44% identity), for which experimental data are available. They modify at least one cytosine in the CC motif [5]. Homologs of these relatively large (about 700 aa) phage proteins are widely distributed, not only in phage genomes of Alpha-(e.g., Rhizobium phage RR1-A) but also in Gammaproteobacteria. Similarly, m 5 C MTases with relaxed specificity are present in genomes of Aeromonas sp. ARM81 phages. Genes encoding ARM81mr_p29 of ΦARM81mr and ARM81ld_p31 of the linear plasmid-prophage ΦARM81ld (both have 34% identity with Phi2LM21_p23) are localized in a replication module (the same as Phi2LM21_p23) or in the vicinity of the plasmid partitioning system, respectively [51]. The location of these MTase genes adjacent to the replication/segregation module may suggest the relevance of the methyltransferase activity at this stage of the virus reproductive cycle. Interestingly, until now, within the published data describing Ensifer/Sinorhizobium genomes, only a few prophage regions were mentioned (but not described in details) or the percentage contribution of the prophage regions within the particular bacterial genome was calculated [7,52]. This exemplifies the significant gap in our general knowledge concerning sinorhizobial (pro)phages. In this study, all (23) prophage regions identified within the sinorhizobial genomes, together with eight active lytic and temperate phages of Sinorhizobium spp. (i.e., ΦM12, ΦM7, ΦM19, ΦM9, ΦN3, Φ16-3, ΦPBC5, and ΦLM21) were subjected to thorough comparative analysis. The summary of the general features of particular sinorhizobial (pro)phages was presented in Table 1.

Comparative Genomics and Networking of the Sinorhizobial (Pro)phages
At first, all (predicted as complete) sinorhizobial (pro)phages compared in this work were subjected to analysis applying the VIRFAM tool [33], which enabled assigning of those viruses into appropriate families. It was revealed that within the analyzed pool of (pro)phages there were representatives of Siphoviridae (22 viruses), Myoviridae (6), and Podoviridae (3) ( Table 1). In the next step, with the application of the Circoletto tool [35], local nucleotide similarities within the genomes of analyzed (pro)phages were found ( Figure S2). Although, all of the analyzed lytic phages were classified into T4 phage superfamily, the analysis confirmed previous findings showing that phages ΦN3, ΦM7, ΦM12, and ΦM19 create a separate group [not showing significant similarities with other sinorhizobial (pro)phages] and the ΦM9 was unique [12,53]. It is also worth mentioning that in 2016 phages ΦN3, ΦM7, ΦM12, and ΦM19 were clustered into a single genus called the M12-like viruses, and additionally phages ΦM12 and ΦM19 were considered as two strains of the same phage [53]. Furthermore, analyzing three others active, but temperate sinorhizobial phages (i.e., Φ16-3, ΦLM21, and ΦPBC5), we found that they show only partial (local) similarities to prophages identified within Sinorhizobium/Ensifer genomes and reciprocally ( Figure S2).
Following the general comparative analysis of the nucleotide sequences of sinorhizobial (pro)phages, all 3688 proteins encoded by 31 analyzed (pro)phages were used in all against all BLASTP searches (thresholds: 10 −5 e-value, 50% identity and 50% of query coverage per subject) to construct protein similarity network. This resulted in a graph with 3688 nodes (proteins) and 3975 edges (reflecting reciprocal proteins similarities) which combined nodes into 666 subgraphs (groups of similar proteins) of different size and 1251 unique, one-element clusters (Figure 4). Amongst subgraphs, there were: 12s:6n, 11s:2n, 9s:5n, 8s:4n, 7s:26n, 6s:7n, 5s:26n, 4s:317n, 3s:98n, and 2s:175n, where s and n indicate the size (number of nodes) of a subgraph and the number of such subgraphs, respectively. This showed that 2437 (66.1%) of all analyzed proteins exhibited homology with at least one other protein in the dataset. Moreover, the analysis revealed that within the analyzed pool of (pro)phages there are highly unique ones, i.e., lytic phage ΦM9, temperate phages ΦPBC5, and Φ16-3, as well as predicted prophages Φ2_CasidaA, pLM21S1, Φ2_BL225C, Φ1_WSM419, and Φ1_SM11 ( Figure 4). To allow transparent visualization of the selected protein networks, the separate clusterings were shown ( Figure 4) and the sequences of those proteins were presented in the form of the multifasta files (File S5).
The analysis of the large subunits of terminases revealed 11 unique proteins not showing significant similarity to other TerLs encoded by the sinorhizobial phages. Those were terminases encoded by ΦM9, ΦPBC5, Φ16-3, ΦLM21, pLM21S1, Φ3LM21, Φ1_SM11, Φ6_SM11, Φ1_WSM419, Φ2_BL225C, and Φ2_CasidaA. The analysis of the overall sinorhizobial phage proteins similarity network revealed that the remaining large subunits of terminases clustered into five multi-element groups, where four groups clustered exclusively TerLs of prophages distinguished in silico within the sinorhizobial genomes, while the last group gathered four TerLs of Myoviridae lytic phages (Figure 4).
The protein clustering showed that integrases for all (21) distinguished prophage sequences and three temperate active phages are highly diversified although they are all tyrosine-specific recombinases. The most numerous group gathered three Int proteins identified in Φ3_WSM419, Φ2_Rm21, and Φ1_RMO17, whereas the remaining integrases were clustered into five pairs and 11 unique proteins ( Figure 4). The comparative analysis of the predicted attachment sites for identified prophages and the ΦLM21 phage revealed the strong congruence between the overall clustering of integrases and the nucleotide sequences of attBs, which may indicate the specificity of particular enzymes toward recognized DNA regions.
The analysis of the major capsid proteins revealed that they were clustered into six multi-element groups composed of: seven (one group), four (two groups), three (one group), and two (pairs) proteins. The remaining nine proteins were unique. The largest group was created by major capsid proteins of Φ1_CasidaA, Φ2_WSM419, Φ1_AK83, Φ1_BL225C, Φ3_SM11, Φ5_SM11, and Φ2LM21 (Figure 4).
In summary, following analysis of the similarity networks for three groups of proteins used as phage molecular markers we may conclude that large subunits of terminases (TerLs) and major capsid proteins represent congruent clustering, while integrases are much less conserved, and it would be difficult to use them as phylogenetic markers for temperate sinorhizobial viruses characterization. The similarity network of 3688 sinorhizobial phages proteins. All the proteins (nodes) belonging to the same (pro)phage are circularly arranged and are linked to the others according to their identity value. The resulting picture for 50% threshold is shown. The size and color of each node (single protein) is proportional to its degree, which reflects the number of homologous proteins within the network (the more unique, the smaller and darker the node). Additionally, selected proteins were highlighted: large terminase subunits (green), major capsid proteins (magenta), integrases (red), holins (blue), ATP-dependent DNA ligases (light blue), endolysins (yellow) and DNA methyltransferases (pink). (b) Visualization of the similarity networks for selected proteins. The sequences of those proteins were presented as the multifasta files (File S5). Letters a, b and c beside the number of the (pro)phage indicate different DNA MTase encoded within the particular virus. The similarity network of 3688 sinorhizobial phages proteins. All the proteins (nodes) belonging to the same (pro)phage are circularly arranged and are linked to the others according to their identity value. The resulting picture for 50% threshold is shown. The size and color of each node (single protein) is proportional to its degree, which reflects the number of homologous proteins within the network (the more unique, the smaller and darker the node). Additionally, selected proteins were highlighted: large terminase subunits (green), major capsid proteins (magenta), integrases (red), holins (blue), ATP-dependent DNA ligases (light blue), endolysins (yellow) and DNA methyltransferases (pink). (b) Visualization of the similarity networks for selected proteins. The sequences of those proteins were presented as the multifasta files (File S5). Letters a, b and c beside the number of the (pro)phage indicate different DNA MTase encoded within the particular virus.
Within analyzed (pro)phages, 39 genes encoding DNA MTases were identified. The majority (33) of those enzymes were classified as m 6 A or m 4 C MTases. Only in seven prophages (pLM21S1, Φ1_WSM419, Φ1_AK83, Φ2_AK83, Φ2_BL225C, Φ4_SM11, and Φ6_SM11) were we not able to distinguish genes encoding DNA MTases. The highest number (4) of the MTase genes were identified in Φ3_AK83. In four (pro)phages (Φ1_BL225C, Φ2LM21, and ΦPBC5) as many as three genes encoding DNA MTases were identified, while, in six other (pro)phages (Φ1_RMO17, Φ1_SM11, Φ16-3, ΦM19, ΦM7, and ΦN3), two genes encoding DNA MTases were found. As shown by the protein network analyses, the identified DNA MTases are highly diverse, which make speculations about their DNA specificity difficult. On the other hand, it was noticed that most of the identified MTase genes were located in the proximity of the phage replication system, including all m 6 A MTases with an NPPY/F/W amino acid motif (the same as was previously shown for MTase genes of ΦLM21, Φ2LM21, and Φ3LM21). Therefore, we hypothesize that the identified MTases (with NPPY/F/W motifs) may also mimic the specificity of the host regulatory CcrM modifying enzyme (i.e., recognize and methylate GANTC sequence) and probably play a role in the virus cycle. The goal of future work is to determine the specific DNA sequences recognized by identified MTases of other sinorhizobial phages and test which of them exhibit CcrM-like specificity.
The analysis of the protein network revealed also that 17 (pro)phages encode ATP-dependent DNA ligases. Among these, 10 proteins encoded by prophages ΦLM21, Φ2LM21, Φ3LM21, Φ1_CasidaA, Φ1_CFNEI73, Φ1_WSM419, Φ1_BL225C, Φ1_Rm41, Φ2_Rm41, Φ1_RMO17, and Φ6_SM11 created a clustered group, in which the one encoded by Φ1_RMO17 seems to be most distinct. Another four-element cluster of ATP-dependent DNA ligases was created by those encoded by Myoviridae phages, which also encode homologous RNA ligases (data not shown). The last two ligases, identified in pLM21S1 and ΦM9, were unique.
Annotation of the phage genomes is still a challenging operation, as usually nearly 60-70% of genes remain annotated as encoding hypothetical proteins [55]. In this study, we faced the same problem, since 2667 (72.32%) of all analyzed (pro)phage proteins were initially annotated as hypothetical ones. After the manual re-annotation of the prophage regions identified within the Sinorhizobium spp. Genomes, we proposed the function for 141 (5.3%) predicted (previously hypothetical) proteins (File S4). Moreover, performing the large-scale protein networking analysis, we were able to suggest the possible function for the next 108 (4%) proteins, annotated previously as hypothetical. Those proteins in our analysis were clustered together with other proteins of predicted functions. Based on this result, we may conclude that the application of the complex manual annotation and high-throughput protein similarity network analysis in (pro)phage studies may significantly facilitate the future annotation of viral genomes and bring valuable suggestions concerning the possible function of the phage proteins for future experimental validations.

Conclusions
In the presented study, thorough manual analysis of the Sinorhizobium genomes revealed the presence of 23 prophages, which, together with eight previously identified active sinorhizobial phages, were subjected to complex comparative analyses applying protein networking. This study revealed that amongst analyzed viral proteins, holins, endolysins, and ATP-dependent DNA ligases are the most conserved, and it was shown that, especially, lytic enzymes form pairs whose genes are co-localized within particular phages. Moreover, congruence between the clustering of large subunits of terminases and major capsid proteins was observed, which reflects the phylogenetic relations between analyzed phages.
The analysis performed was the first such complex comparative study of the sinorhizobial phages. Using the example of Sinorhizobium phages, it was shown that application of complex manual annotation and high-throughput protein similarity network analysis may significantly improve overall phage annotation, as in this study we were able to suggest the possible function for nearly 10% of predicted proteins, previously annotated as hypothetical ones.
Moreover, in this study, it was shown that genes encoding DNA MTases are abundant in genomes of sinorhizobial phages and the phenomenon of the convergent evolution between phage MTases and the host regulatory CcrM MTase is common in Sinorhizobium spp., and most probably in other Alphaproteobacteria. Interestingly, it was also shown that the DNA MTases exhibiting CcrM-like specificity may not share high sequence similarity, however, they are all localized within the predicted replication modules of phages, which strongly suggests their regulatory role.
Supplementary Materials: The following are available online at www.mdpi.com/1999-4915/9/7/161/s1, Figure S1: Profiles of E. coli ER2566 cell lysis as the result of Phi2LM21_p54 expression; Figure S2: Comparative genomic analyses of 31 sinorhizobial (pro)phages; Table S1: Oligonucleotide primers used in this study; Table S2: Genes located within the Φ2LM21 prophage; Table S3: Genes located within the ΦL3M21 prophage; Table S4: Genes located within the Φ4LM21 prophage remnant; File S1: GenBank file with annotated sequence of the Φ2LM21 prophage; File S2: GenBank file with annotated sequence of the Φ3LM21 prophage; File S3: GenBank file with annotated sequence of the Φ4LM21 prophage remnant; File S4: Combined GenBank files with annotated sequences of 20 putative, complete prophages retrieved from the Ensifer/Sinorhizobium genomes; File S5: Combined multifasta files with amino acid sequences of the large subunits of terminases, major capsid proteins, integrases, holins, endolysins, ATP-dependent DNA ligases and DNA methyltransferases of analyzed Ensifer/Sinorhizobium (pro)phages.