3.3. Genome Features
The terminal analysis revealed that no protruding cohesive end was found in the complete genome and suggested that phage R5C has a circular, double-stranded DNA genome according to Zhang et al. [
37]. The genome size of R5C is 77,874 bp (
Figure 3), which is the second largest among the published genomes of roseophages. It is suggested that the likelihood of phage interference with host cellular activities increases with genome size. R5C has a G+C content of 61.5%, which is the highest among all roseophages (
Table S1). Generally, the G+C content is lower in phages than that in their hosts, while temperate phages have smaller biases towards G+C content [
51]. For example, the average G+C values of the temperate phages ΦCB2047-A (58.8%) and ΦCB2047-C (59.0%) are close to that of their host
Sulfitobacter sp. strain 2047 (60.3%). Interestingly, a small G+C deviation is also observed between R5C and its host (66.0%), which suggests that R5C may follow a temperate phage strategy. No tRNA sequences were detected in the R5C genome using the tRNAscan-SE program. The lack of tRNA was also found in other roseophages such as SIO1, P12053L, ΦCB2047-A, ΦCB2047-C, RDJLΦ1, RDJLΦ2, RD-1410W1-01, RD-1410Ws-07 and DS-1410Ws-06. Among the four roseosiphophages, DSS3Φ8 has the longest genome containing 24 tRNAs. In the literature, tRNA has been associated with longer genome length, higher codon usage bias and higher virulence [
52].
In total, 123 ORFs were identified in the R5C genome using GeneMarkS and ORF Finder software (
Table S2). A total of 66 gene products had homologous sequences in the NCBI non-redundant protein database and 41 of these could be assigned a recognizable function. At the amino acid level, genes homologous to that from other phages showed less than 71% similarity. About 66.7% of the ORFs (82 ORFs, about 40% of the phage genome length) had no annotated features, while 57 of these ORFs had no matches in the databases. Single gene analysis showed R5C to be weakly similar to the known
Siphoviridae. However, little or no nucleotide similarity was detected with these phages and protein homology was also detected with a few loci, with only one or two signature phage genes being shared between phages. Fifteen ORFs of R5C were homologous to that of both RDJLФ1 and RDJLФ2, showing similarly low identity levels (ranging from 24 to 72% and 25 to 74%, respectively). Furthermore, 19 ORFs with low identity (22–51%) were detected to be similar between R5C and DSS3Ф8. This suggested that R5C sequences presented high levels of divergence from known phage genomes and that proteins encoded by siphoviruses are under-represented in the database.
Among the 41 ORFs with recognizable functions, 10 were related to the structure and assembly of virions, such as a coat protein, a head-to-tail connecting protein, a tail fiber protein and the large subunit of terminase. Sixteen ORFs were predicted to encode proteins involved in DNA replication, metabolism and repair, while one conserved lysis ORF, acetylmuramidase, was predicted in the R5C genome. This was the first time that the DNA transfer protein, which is transcribed in the pre-early stage of infection in T5, had been detected in roseophages. Interestingly, four gene transfer agent (GTA) homologous genes and five queuosine biosynthesis genes were found in the R5C genome. Additionally, integrase and repressor genes, which indicate a potential for a lysogenic cycle, were not found in the R5C genome.
We compared the genomes of four roseosiphophages that possess gene transfer agent genes and found only seven conserved shared genes, including ribonucleotide reductase, DNA helicase, deoxycytidylate deaminase and GTA-like genes, with 22–50% identity at the amino acid level (
Figure 4). This demonstrated the extremely high level of genetic divergence of roseosiphophages. The ribonucleotide reductase gene in R5C shares high amino acid identity with that of roseophages RDJLФ1 (44%) and RDJLФ2 (44%). As a key enzyme involved in DNA synthesis, ribonucleotide reductases are found in all organisms and convert nucleotides into deoxynucleotides [
53]. In the phosphorus-limited marine environment, obtaining sufficient free nucleotides is critical for DNA synthesis [
54,
55]. DNA helicases are motor proteins that use the energy from NTP hydrolysis to separate transiently energetically-stable duplex DNA into single strands [
56]. The ubiquity of helicases in prokaryotes, eukaryotes, and viruses indicates their fundamental importance in DNA metabolism [
57]. Deoxycytidylate deaminases catalyze the deamination of dCMP to dUMP and thus provide the nucleotide substrates for thymidylate synthase [
58]. All roseosiphophages isolated have highly conserved GTA-like genes, whereas all podophages infecting the
Roseobacter clade roseopodophages lack similar genes. The four GTA-like genes (gp12–gp15) are close to genes encoding structural proteins, such as the tail tape measure protein of R5C, and the same structural phenomenon is also observed in other GTA-harboring phage genomes. These observations suggested that the function of gp12–gp15 may be related to the specific structure of siphophages, such as the tail. Further protein analyses are needed to verify this assumption.
Like many other phages, the R5C genome contains a variety of auxiliary metabolic genes (AMGs). Currently, DNA metabolism and nucleotide synthesis genes are the most prevalent AMGs in roseophage. In R5C, we found AMGs frequently appeared in marine phages, such as phoH (ORF 47) and those firstly identified in roseophages (e.g., heat shock protein (ORF 74) and queuosine biosynthesis genes (ORF 79, ORF 82–84 and ORF 95)). A greater number of AMGs may broaden the role that phage play in their hosts’ fitness during infection.
The
phoH gene has been detected in phages infecting both heterotrophic and autotrophic bacteria, such as
Prochlorococcus phage P-SSM2,
Synechococcus phage Syn9, SAR11 phage HTVC008M, and
Vibrio phage KVP40 [
6,
54,
59,
60]. Roseophage SIO1 and DSS3Ф8 also possess the
phoH gene [
13,
26]. Phage-encoded
phoH genes have previously been described as apparent parts of a multi-gene family with divergent functions and have played a part in phospholipid metabolism, RNA modification, and fatty acid beta-oxidation [
54,
61,
62]. It is suggested that the
phoH gene in the phages aids host regulation of phosphate uptake and metabolism under low-phosphate conditions, which is consistent with the environment from which R5C was isolated, namely the oligotrophic South China Sea.
Heat shock proteins are postulated to protect organisms from the toxic effects of heat and other forms of stress. These proteins exist in every organism studied from archaebacteria to eubacteria and from plants to animals [
63]. Cellular heat-shock responses occur during the replication of many viruses, such as adenovirus and human cytomegalovirus [
64,
65]. This is the first report of a heat shock protein in roseophages. The
grpE gene alone encodes a 24-kDa heat shock protein. The GrpE heat shock protein is important for bacteriophage λ DNA replication at all temperatures and for bacterial survival under certain conditions [
66].
As a hypermodified nucleoside derivative of guanosine, queuosine occupies the wobble position (position 34) of the tRNAs coding for Asp, Asn, His or Tyr. The hypomodification of queuosine-modified tRNA plays an important role in cellular proliferation and metabolism [
67]. The mechanisms of action of the queuosine biosynthesis genes in viruses remain unclear, even though similar gene clusters have been found in
Streptococcus phage Dp-1,
Escherichia coli phage 9g and other viruses [
68,
69,
70]. The queuosine biosynthesis genes were detected for the first time in the genome of a roseophage in this study.