Genome Analysis of Two Novel Synechococcus Phages That Lack Common Auxiliary Metabolic Genes: Possible Reasons and Ecological Insights by Comparative Analysis of Cyanomyoviruses

The abundant and widespread unicellular cyanobacteria Synechococcus plays an important role in contributing to global phytoplankton primary production. In the present study, two novel cyanomyoviruses, S-N03 and S-H34 that infected Synechococcus MW02, were isolated from the coastal waters of the Yellow Sea. S-N03 contained a 167,069-bp genome comprising double-stranded DNA with a G + C content of 50.1%, 247 potential open reading frames and 1 tRNA; S-H34 contained a 167,040-bp genome with a G + C content of 50.1%, 246 potential open reading frames and 5 tRNAs. These two cyanophages contain fewer auxiliary metabolic genes (AMGs) than other previously isolated cyanophages. S-H34 in particular, is currently the only known cyanomyovirus that does not contain any AMGs related to photosynthesis. The absence of such common AMGs in S-N03 and S-H34, their distinct evolutionary history and ecological features imply that the energy for phage production might be obtained from other sources rather than being strictly dependent on the maintenance of photochemical ATP under high light. Phylogenetic analysis showed that the two isolated cyanophages clustered together and had a close relationship with two other cyanophages of low AMG content. Comparative genomic analysis, habitats and hosts across 81 representative cyanomyovirus showed that cyanomyovirus with less AMGs content all belonged to Synechococcus phages isolated from eutrophic waters. The relatively small genome size and high G + C content may also relate to the lower AMG content, as suggested by the significant correlation between the number of AMGs and G + C%. Therefore, the lower content of AMG in S-N03 and S-H34 might be a result of viral evolution that was likely shaped by habitat, host, and their genomic context. The genomic content of AMGs in cyanophages may have adaptive significance and provide clues to their evolution.


Introduction
With cell numbers of up to 10 6 cells mL −1 in the global oceans, unicellular cyanobacteria are amongst the most abundant photosynthetic organisms on earth and make major contributions to XL Cassette, Biomax ® 50 kDa; polyethersulfone, Millipore Corporation, Billerica, MA, USA) and concentrated through tangential flow to make the viral concentration reach 300 times the initial concentration. The viral concentrate was stored at 4 • C in the dark [26,27].
The host of the cyanophages is Synechococcus sp. strain MW02 (NCBI accession number KP113680). The algal culture was grown in conical flasks with f/2 seawater medium under a constant illumination of approximately 25 µmol m −2 s −1 at 25 • C in a 12-h/12-h light-dark cycle [27]. The phage enrichment was performed by adding the viral-concentrated seawater to the exponentially growing host Synechococcus in a ratio of 1:9. The phage-host suspension was incubated under a constant irradiance of 25 µmol m −2 s −1 at 25 • C for about 1 week until lysed host cells were observed according to the color and turbidity of the lysate. A control group was set up in parallel by replacing the viral solution with the medium [28]. The cyanophage lysates were then filtered through a 0.22 µm pore size membrane (Millex ® -GP 0.22 µm PES; Merck, Ireland). The infection was repeated three times. The filtrate was stored at 4 • C in the dark for further tests [29].

Phage Purification
Phage purification was performed using the serial dilution method, as described previously [27].
Generally, the infectivity was tested across the serially diluted phage samples (10 times dilution over 7 orders of magnitude). The most diluted phage sample that induced host lysis was used for another round of serial dilution and infection tests. After three rounds of purification, a pure lysate with a single phage strain was theoretically produced [28]. The cyanophage was then concentrated using Amicon ® Ultra 15 with a 30 k-Da ultra-PL membrane (Merck, Ireland) [30]. Further purification was performed by sucrose density gradient centrifugation [27].

Host Range
The infectivity of cyanophages S-N03 and S-H34 was tested using nine Synechococcus strains, including Synechococcus WH7803, WH8102, MW02, MW03, LTWRed, LTWGreen, PSHK05, CCMP1333, PCC7002 (Table S1). The viral solution was added to each host Synechococcus culture in logarithmic growth phase at a volume ratio of 1:9, in triplicates. The viral solution was replaced by the medium in the control group. The mixtures were incubated under the same conditions described above. Cell lysis was monitored and compared in the control and viral solution groups every day for two weeks to examine the infectivity.

Morphological Study by Transmission Electron Microscopy
The 20 µL purified phage suspensions were placed onto a 200-mesh copper grid and stained by adding a drop of 1% (w/v) phosphotungstic acid (pH 7.2) for 10 min [29]. The grids were examined using a transmission electron microscope (JEOLJEM-1200EX, Japan) at 100 kV to reveal the cyanophage structural characteristics and dimensions [31].

Genome Sequencing and Assembly
Phage DNA was extracted from the sucrose density gradient-purified phages using a TIANamp Virus DNA Kit (TIANGEN) [27]. A total of 1 µg DNA per sample was used as input for the DNA sample preparations. Sequencing libraries were generated using NEBNext ® Ultra™ DNA Library Prep Kit for Illumina (NEB, USA). The whole genomes of S-N03 and S-H34 were sequenced using Illumina NovaSeq PE150 by an ABI 3730 automated DNA sequencer. The reads containing >40% low-quality bases (mass value ≤20), >10% N content, overlap with the adapter for >15 bp with less than 3 mismatches, were removed. The reads were assembled with SOAP denovo [32], SPAdes [33], and Abyss [34] software packages. The assemblies from the three software packages were then integrated with CISA software to select the one with the least scaffolds. Gapcloser and GapFiller were used to fill the assembly gaps [27,35].

Genome Annotation and Analysis
The open reading frames (ORFs) in the genomes of cyanophage S-N03 and S-H34 were predicted using GeneMarkS [36], GLIMMER [37] and RAST (Rapid Annotation using Subsystem Technology) [38]. The predicted ORFs were translated into amino acid sequences and their homologous genes were searched in the NCBI (National Center for Biotechnology Information) non-redundant protein database by BLASTp [39,40]. The protein domains were predicted and analyzed by InterPro [41] and CDD [42]. tRNA scan-SE was used to identify transfer RNA (tRNA) genes [43], and RNAmmer was used to predict ribosomal RNA in the full genome sequence [44]. Genome mapping was performed using DNAplotter (version 17.0.1).
An AMG database was created, which summarized protein sequences from 33 AMGs that were chosen based on prior recognition and extracted from various cyanomyovirus genomes [11,45]. A total of 337 genomes were downloaded from the NCBI database, which include all isolated cyanomyovirus with complete genomes available at the time of analysis (Table S2). Only one representative genome, when there were phages of the same name with the average nucleotide identity (ANI) greater than 95% was kept. Finally, 81 representative genomes of cyanophages were selected. Gene identity was assigned to a corresponding AMG gene when the BLASTp E-value ≤10 −5 , sequence identity ≥35%, and the query cover ≥60% [11]. The genome sequences of S-N03 and S-H34 were deposited in the GenBank database under accession number MT162466 and MT162467, respectively.

Comparative Analysis of Cyanophage Genomes
The ViPTree server was used to generate a proteomic tree based on the genome-wide sequence similarities computed by tBLASTx [46,47]. All related viruses contained in the Virus-Host Database were used to establish a circular tree [48]. The 37 closest phages in the circular tree were then selected to establish a rectangular tree with phage S-N03 and S-H34 for subsequent comparison and analysis. The genome sequences of S-N03 and S-H34 were compared with that of phage S-B68 by tBLASTx using ViPTree. Meanwhile, phylogenetic analysis with other related phages were carried out using the amino acid sequences of DNA polymerase and terminase large subunit by the ClustalW program. The maximum-likelihood (ML) phylogenetic tree was constructed by genetic analysis software MEGA (Version 7.0.18) [49,50]. The bootstrap values were based on 1000 replicates. The average nucleotide identity (ANI) was calculated using OrthoANI (Average Nucleotide Identity by Orthology) [51] and JSpeciesWS Online Service [52].

Host Range and Phage Morphology
The host of cyanophage S-N03 and S-H34 is PE-type (phycoerythrin-only) Synechococcus sp. strain MW02, which belongs to subcluster 5.1 clade IX and was originally isolated from Hong Kong estuarine waters [53]. The cross-infectivity test showed that both S-N03 and S-H34 infected the other three PE-type Synechococcuses belonging to subcluster 5.1 clade II, V and subcluster 5.2 (Table S1). The transmission electron microscopy examination showed that S-N03 and S-H34 displayed icosahedral heads of 97 and 88 nm in diameter and contractile tails of 138 and 129 nm in length, respectively. Their sizes are within the range of the previously isolated cyanomyoviruses ( Figure S1).

General Genomic Features
Cyanophage S-N03 and S-H34 both contain a circular double-stranded DNA genome revealed by the terminal analysis that showed no protruding cohesive. The genome sizes of S-N03 and S-H34 are 167,069-bp and 167,040-bp, which are the sixth and seventh smallest genomes among the 81 representative cyanomyoviruses ( Table 1). The G + C content of the genomes S-N03 and S-H34 are both 50.1%, two of the only four published cyanomyoviruses genomes (S-B68, S-CBWM1, S-N03 and S-H34) with G + C contents close to 50%. The G + C content of Prochlorococcus phages (34.3-40.7%) is generally lower than that of Synechococcus phages (35.4-51.7%). In the 81 representative cyanomyoviruses, about 71.4% of Prochlorococcus phages have G + C contents of less than 38.1%, while most of the Synechococcus phages have G + C contents between 38% and 45% and only 11.1% have G + C contents of less than 38.1% (Table 1). Studies have shown that the genomes of some organisms that depend on the survival of the host, such as bacteria, phages, and plasmids, are often rich in A + T, that is, the G + C content is low. This may be due to the differential cost of related metabolites in the cell and the limited availability of G and C relative to A and T/U [54]. Neutral bias can also explain the higher A + T content of phages, because the depletion of host bacterial resources may result in the systematic insertion of more abundant A and T nucleotides [54]. Moreover, the G + C content of the phage may be also affected by that of their host. A positive correlation has been observed between G + C content of bacteriophage and their host [55]. In this study, the G + C contents of Synechococcus marinus WH8102 and WH7803 that can be infected by S-H34 and S-N03 were 59.2% and 60.2%, respectively (the whole genome of their host MW02 was not published), which was at a relatively high level in the G + C content range of marine Synechococcus (~50-60% G + C content) [56]. Such high G + C content of a host might be associated with the high G + C content of the cyanophages, although more evidence is needed to prove this point. Thus, we inferred that the higher G + C content implied that S-N03 and S-H34 may have experienced independent evolutionary routes compared to the cyanophages of lower G + C values and evolved specific genomic traits that adapted to their hosts and their surrounding environments.
In order to understand whether there is a relationship between the genome size and G + C content in the viral genome, the Spearman correlation analysis was performed based on the 81 representative genomes of cyanophages. The relationship between genome size and G + C content has been studied for bacteria but seldomly investigated on viruses [57]. Interestingly, we found a significant negative correlation between genome size and G + C content of cyanomyoviruses (r = −0.34, p < 0.01, Table 2). This is in contrast to bacteria and archaea, which were shown to have positive correlations between G + C values and genome size, but consistent with the result obtained in bacteriophages [55,57]. Such a negative relationship in cyanomyoviruses is still thought to be related to their adaptive evolution. If a phage genome is large and enriched with G + C at the same time, higher energy cost and limited availability of G/C could constrain phage-DNA replication, which does not comply with the life strategy of viruses.  Table 2. Spearman's rank correlation coefficient between genome size, G + C content and AMGs. *: p < 0.05; **: p < 0.01; n = 81.

Genome Size GC% AMGs
Genome size Correlation coefficient 0.992 0.014 -A total of 247 and 246 potential ORFs were identified in the S-N03 and S-H34 genomes, respectively (Table S3 and S4). Functional annotation of predicted ORFs in the NCBI non-redundant protein database showed that only 72 (29.15%) were assigned to specific functions in S-N03 (E-value < 10 −5 ), while the rest 175 (70.85%) were predicted to encode hypothetical proteins, due to incomplete genomic information of the cyanophage in database [69]. Similarly, 70 (28.46%) predicted ORFs were assigned to specific functions in S-H34, while the rest 176 (71.54%) were predicted to encode hypothetical proteins. All predicted ORFs can be divided into five functional groups, including structuring (S-N03, 31ORFs and S-H34, 28 ORFs), packaging (S-N03, 3ORFs and S-H34, 3 ORFs), DNA replication and regulation (S-N03, 26 ORFs and S-H34, 29ORFs), hypothetical protein and additional functions related to physiological activity (12 ORFs in S-N03 and 10 ORFs in S-H34) ( Figure 1A,B).
The functional annotation of phage structural proteins is highly dependent on the sequence similarity to proteins of other phages that were detected in respective viral particles [70,71]. The putative structural proteins in S-N03 and S-H34 were the baseplate, the tail tube, the tail sheath, the tail fibers, the tail completion proteins and neck proteins. The packaging modules of both S-N03 and S-H34 contain three ORFs, including terminase large subunit, terminase small subunit and major capsid protein. The DNA replication and conditioning module contained a wide variety of categories, including DNA primase, RNA polymerase, single-stranded DNA binding protein UvsY, endonuclease, DNA polymerase, exonuclease, ribonuclease H, ribonucleoside-diphosphate reductase alpha subunit (NrdA), and ribonucleotide diphosphate reductase beta subunit (NrdB). Among these, NrdA (S-N03: ORF202, S-H34: ORF4) and NrdB (S-N03: ORF201, S-H34: ORF3) are involved in DNA synthesis by converting nucleotides into deoxynucleotides, and can be found in all organisms [72,73]. In a marine environment with limited phosphorus content, obtaining sufficient free nucleotides is critical for DNA synthesis [72,74,75]. With ribonucleotide reductase (NrdA, NrdB) and thymidylate synthase (S-N03:ORF171, S-H34:ORF215), the rate of DNA synthesis of T4-like phage could be increased 10-fold compared to a system without these enzymes [76].
Additional proteins modules are mainly related to metabolism and regulation. In addition to AMGs (detailed results and discussions are shown in Section 3.5), S-N03 and S-H34 also have regulatory genes, such as genes encoding serine/threonine kinase (PSKs) PknB (S-N03: ORF136, S-H34: ORF180), serine/threonine phosphatase (S-N03: ORF100, S-H34: ORF147) and endolysins. PknB is a typical Ser/Thr kinase, which catalyzes the transfer of the gamma-phosphoryl group on the ATP molecule to the Ser/Thr residue of the protein substrate. It is involved in regulating many biological processes, including purine and pyrimidine biosynthesis, cell wall metabolism, antibiotic resistance, peptidoglycan synthesis, cell division, transcription, stress response and metabolic regulation [77][78][79]. Ser/Thr phosphatase (PSPs) are responsible for dephosphorylation of phosphoprotein substrates, which is the reverse process of Ser/Thr kinase catalysis. They participate in many cell pathways that regulate cell reproduction and programmed death [80]. The reversible phosphorylation of proteins is accomplished by opposing activities of kinases and phosphatases [80]. S-N03 and S-H34 also contain endolysin with amino acid identities of 51.75% (97% coverage) with that of cyanophage S-B68. Endolysins are enzymes produced by phages. They are responsible for catalyzing the hydrolysis of the peptidoglycan in the bacterial cell wall and rupturing the cell at the end of the virulence cycle [81].

tRNA Genes
Apart from host-like genes, cyanomyoviruses have also incorporated tRNA genes into their genomes. In this study, only one tRNA gene (Ans) was identified in the genome of S-N03 and five tRNA genes (Tyr, Asp, Val, 2× Ans) were identified in the genome of S-H34 (Table 3). The number and types of tRNA genes is variable in different Cyanomyoviruses (Table 1), which is a result of phage-host co-evolution, driven by the optimal codon usage [82]. The tRNAs carried in phage genomes match codons highly used by the phage and poorly used by the bacterial host during the infection [83]. They may augment the expression of late phage genes encoding structural proteins, such as phage capsid and tail proteins [56,84,85]. Therefore, the tRNA genes carried by S-N03 and S-H34 may contribute to phage protein synthesis and help the phage to adapt to a particular host or environment. Among the 81 published cyanomyoviruses, the number of tRNA contained in Prochlorococcus phages (0-4 tRNA) is significantly less than that in Synechococcus phages (1-36 tRNA); S-N03 is one of the Synechococcus phages that contains the least amount of tRNA. It has been proposed that the number of tRNAs genes is closely associated with differences of G + C content between phage and host: more tRNAs may increase the translation efficiency when infecting a host with higher G + C content, and potentially expand their potential host range while maintaining relatively lower G + C content in their genomes [56]. As such, the tradeoff between the G + C content and the occurrence of tRNA genes may result in the relatively low number of tRNAs and the wide host range of S-N03

tRNA Genes
Apart from host-like genes, cyanomyoviruses have also incorporated tRNA genes into their genomes. In this study, only one tRNA gene (Ans) was identified in the genome of S-N03 and five tRNA genes (Tyr, Asp, Val, 2× Ans) were identified in the genome of S-H34 (Table 3). The number and types of tRNA genes is variable in different Cyanomyoviruses (Table 1), which is a result of phage-host co-evolution, driven by the optimal codon usage [82]. The tRNAs carried in phage genomes match codons highly used by the phage and poorly used by the bacterial host during the infection [83]. They may augment the expression of late phage genes encoding structural proteins, such as phage capsid and tail proteins [56,84,85]. Therefore, the tRNA genes carried by S-N03 and S-H34 may contribute to phage protein synthesis and help the phage to adapt to a particular host or environment. Among the 81 published cyanomyoviruses, the number of tRNA contained in Prochlorococcus phages (0-4 tRNA) is significantly less than that in Synechococcus phages (1-36 tRNA); S-N03 is one of the Synechococcus phages that contains the least amount of tRNA. It has been proposed that the number of tRNAs genes is closely associated with differences of G + C content between phage and host: more tRNAs may increase the translation efficiency when infecting a host with higher G + C content, and potentially expand their potential host range while maintaining relatively lower G + C content in their genomes [56]. As such, the tradeoff between the G + C content and the occurrence of tRNA genes may result in the relatively low number of tRNAs and the wide host range of S-N03 and S-H34, given their relatively high G + C content (50.1%; Table 2) which is close to that of their hosts (marine Synechococcus of ∼50-60% G + C) [56].

Genome Comparison and Phylogenetic Analysis
Comparative genome analysis was undertaken to reveal the divergence of the nucleotide sequence of S-N03 and S-H34 from other cyanophages. The "proteomic tree", based on the genome-wide similarities, showed that the phages with the closest phylogenetic relationship to S-N03 and S-H34 all belong to cyanomyoviruses (Figure 2A,B). S-N03 and S-H34 have the closest genetic relationship with each other and group into the branch that also contains S-B68 and S-CRM01 and are distant from other cyanophages. The phylogenetic analyses of S-N03, S-H34 and other selected dsDNA viruses, based on the DNA polymerase and terminase large subunit sequences using the maximum likelihood method (ML), showed similar branching positions of S-N03 and S-H34 ( Figure 2C,D). They share a highest similarity to each other with a nucleotide identity of 84.21% (coverage rate 67.81%). In this cluster, a lower similarity of S-N03 and S-H34 was obtained to S-B68 (identity of 70.54% and 72.18%, respectively) and S-CRM01 (identity of 63.71% and 64.76%, respectively), indicating that S-N03 and S-H34 are novel cyanophages.
Among these four isolates of cyanophage, S-H34, S-N03 and S-B68 are all marine lytic phages and have higher similarities, while S-CRM01 is a freshwater strain. Therefore, the genomes of S-N03 and S-H34 were compared with S-B68 ( Figure 1C). The three genomes were found to share a high similarity in some proteins coded by the conserved genes, including DNA polymerase, terminase large subunit, and the major capsid protein, with 75.31-93.65% identity on amino acid level ( Figure 1C). The differences among the genomes come mainly from a number of hypothetical proteins. Moreover, S-B68 is distinguished from S-H34 and S-N03 by the metabolic genes it encodes. It could be speculated that these differences arise due to the difference in host species because S-B68 have a different host from S-H34 and S -N03 [86].

Auxiliary Metabolic Genes (AMGs)
AMGs are commonly found in the genome of cyanophages. Among 81 cyanomyoviruses with available complete genomes, 92.6% are found to contain more than 5 AMGs (Figure 3). However, the newly isolated cyanophage S-N03 contains only 4 AMGs (hsp, MazG, ptoX, phoH) and S-H34 contains only 3 AMGs (hsp, MazG, phoH). The latter one is the phage with the least number of AMGs genes isolated so far (Figure 3). All of the AMGs found in S-N03 and S-H34 genomes are highly conserved genes among cyanophages.
Viruses 2020, 12, 800 12 of 21 isolated so far ( Figure 3). All of the AMGs found in S-N03 and S-H34 genomes are highly conserved genes among cyanophages.  The AMG content was listed in an ascending order. The names of the cyanophages are colored separately by the originally isolated genus of the host: Synechococcus is red and Prochlorococcus is blue.

MazG Gene (Pyrophosphatase)
MazG protein, the pyrophosphatase, is known as a regulator of nutrient stress and programmed cell death in E. coli [20]. The phage-encoded MazG was proposed to regulate the cellular level of ppGpp and, therefore, to affect transcription and translation in the host and extend the period of cell survival under the stress of phage infection [59,87]. However, a recent study showed that the purified cyanophage S-PM2 MazG has no binding or hydrolysis activity to (p)ppGpp [88]. Instead, dGTP and dCTP seem to be the preferred substrates for this protein, and affinity of the viral MazG for dGTP and dCTP is higher than their host counterparts. This may partially explain the lower G + C content of cyanophage genomes (37.7%) than that of the Synechococcus host genomes (60.2%), and it is consistent with preferential hydrolysis of deoxyribonucleotides in the host Synechococcus genome of high G + C content [88]. However, whether such a mechanism is applicable to cyanophages whose genomes generally have a similar G + C content with their hosts, such as S-H34 and S-N03 in this study, has yet to be determined. MazG is a highly conserved gene in cyanopodoviruses and cyanomyoviruses that infect Synechococcus. Only S-TIM5 and S-CBWM1 lack MazG in the 81 cyanomyoviruses examined (Figure 3). Previous research used the pyrophosphate nucleotide hydrolase gene MazG to prove that cyanophages are globally distributed. Despite the widespread presence of MazG gene in cyanophages, they have a small effective population size, indicative of rapid lateral gene transfer [20]. The phylogenetic trees based on MazG gene from previous studies showed that Prochlorococcus and Synechococcus phage MazG genes do not cluster with their hosts' MazG, suggesting that this gene may be not obtained from the host but acquired by lateral gene transfer from other sources [20,88].

phoH Gene (P-Starvation Inducible Protein)
As the most prevalent phosphate-regulating gene in the genomes of cyanophages, phoH is present in 80 of 81 related cyanomyoviruses (except S-SKS1, Figure 4). Although its function is still unclear, it has been used as a molecular marker for describing viral diversity due to its universality [21,89]. In this study, the host of cyanophage S-N03 and S-H34, Synechococcus sp. strain MW02, was isolated from a Hong Kong estuarine site (affected by the Pearl River flows) where phosphorus limitation is usually present [53,90]. As such, these genes may play a role in regulating the phosphate uptake of the hosts from the environment.

Hsp Gene (Heat Shock Protein)
Heat shock proteins (Hsps) are clusters of proteins that are induced in response to physical and chemical environmental stresses. They can facilitate cellular recovery from the damage caused by participating in protein translocation, re-folding and degradation, and are known as "molecular chaperones" [91]. Most of the heat shock proteins found in bacteriophages are small (sHsps), which can suppress protein aggregation and protect against cell stress, and are generally active as large oligomers consisting of multiple subunits [92,93]. Specifically, the heat shock protein family in phages might be important for scaffolding during maturation of the capsid [45]. Only 4 of the 81 cyanophages (S-CBWM1, B2, B23 and Syn10) did not contain the Hsp gene ( Figure 3). Previous study has shown that cyanophage's sHSPs form a monophyletic clade phylogenetically closer to bacteria than to cyanobacteria, while the host cyanobacterial sHSPs sequences forms a monophyletic clade closer to plants [93]. Such phylogenetic relationships point to horizontal gene transfer events that probably occurred millions of years ago. This means that the cyanophage sHsp gene has evolved independently and differently from its actual host cyanobacteria, but it still co-evolved with the host cyanobacteria in the pseudo-or lysogenic stage [93].

ptoX Gene (Plastoquinol Terminal Oxidase)
PTOX is an enzyme that mediates the electron flow from plastoquinol to oxygen. It exerts a variety of effects on the development and functioning of plant chloroplasts, including carotenoid biosynthesis, photoprotection and chlororespiration [60,94]. It does not exist in all photosynthetic organisms, but it is widely distributed among different strains of cyanobacteria [95]. The ptoX genes are also widespread among marine cyanomyoviruses (Figure 3). By carrying the ptoX gene, the cyanophage may have another way of preventing photodamage other than the psbA route [60]. PTOX can oxidize the plastoquinol produced by chloroplast NAD(P)H quinone oxidoreductase, which called chlororespiration. In the phage genome, PTOX is often arranged adjacent to NAD(P)H quinone oxidoreductase [95]. And in the genome of S-N03, NAD(P)H oxidoreductase appears upstream of PTOX ( Figure 1A, Table S3). This indicates that NAD(P)H oxidoreductase and PTOX represent functional units in these cyanobacteria and may be transcription units. The phylogeny of the PTOX protein of the host and the cyanophage implies that although both Synechococcus and Prochlorococcus hosts and the cyanophage ptoX gene may share a common ancestor, they have evolved independently since then [60].

Lack of Photosynthetic AMGs
The prevalence of AMGs in the 81 sequenced cyanomyoviruses of Synechococcus and Prochlorococcus (including the 2 strains in this study) is shown in Figure 3. It shows that all the sequenced cyanomyoviruses carry at least one photosynthesis-related AMG. Some of the genes, such as the psbA, psbD, and hli, are common in the genomes of cyanomyoviruses. For example, it was found that 99% of the phage genomes contain hli, 95% contain psbA, and 76% contain both psbA and psbD. However, unlike all other known cyanomyoviruses, the genome of the novel strain S-H34 did not contain photosynthesis-related AMGs. The novel strain S-N03 only contains one photosynthesis-related AMG ptoX (plastoquinol terminal oxidase), but lacks the common psbA, psbD, and hli.
The D1 and D2 proteins encoded by the psbA and psbD genes are core reaction-center proteins in photosystem II and participate in photochemical reactions. The D1 protein produced by the host turns over rapidly under high light and declines during phage infection [12,86,96]). Therefore, the expression of phage psbA gene during infection, as confirmed in previous studies, can bolster the host's photosynthesis [12]. The high light-induced protein encoded by hli serves an important role in preventing cellular light damage by redirecting excessive light energy and protecting the photosynthetic apparatus [97]. By supplementing host photosynthesis, the phage photosynthetic AMGs ensures the energy required for phage maximum production and thus enhances their fitness [14]. It has been suggested that some phage photosynthetic AMGs (i.e., psbA and hli) have become an integral part of the phage genome as they are co-transcribed with the essential, highly expressed phage capsid genes surrounding the photosynthesis genes [12]. As such, the absence of the common photosynthetic AMGs in S-N03 and S-H34 infers their distinct evolutionary route. It also suggests that the energy for morphogenesis during phage production might be obtained from sources other than those strictly dependent on the maintenance of photochemical ATP under high light.
It has been suggested that the number of photosynthesis AMGs (i.e., psbA and hli) and the optimal combination required by the phage may be determined by the light level [92]. The cyanophage fitness enhancement conferred by the phage photosynthesis genes only occurrs under a certain range of high light [86,98]. For example, a novel agent-based model shows that the phage photosynthesis genes are not necessary at a depth of 30 m, and the optimal photosynthesis gene combination in the phage was simplified to 0 psbA and 1 hli at a depth of 120 m [98]. In addition, the length of the latent period of infection has also been speculated to determine the presence or absence of psbA in a phage genome [99]. Therefore, the distinct genomic feature of S-H34, with an absence of all photosynthetic AMGs, might be the result of environmental adaptation and/or their own physiological characteristics.

Low AMG Contents in S-H34 and S-N03
Of the known cyanomyoviruses, less than 10% carry ≤10 AMGs in their genomes (Figure 3). S-H34 is one of the strains with by far the fewest AMGs (3 AMGs, equal to that of S-CBWM1). Intriguingly, from the phylogenetic analysis, it was noticed that S-N03 and S-H34 have the closest relationships with S-B68 and S-CRM01 ( Figure 2B), which also have fewer AMGs (4 AMGs in S-B68 and 7 AMGs S-CRM01). This suggests that AMG content may be related to the genetic relationship. However, large variations of AMG content in phylogenetically closely-related cyanomyovirus genomes have also been demonstrated in previous studies [11,27]. It has been suggested that both vertical and horizontal evolution determine the AMGs content: the highly conserved AMGs across cyanomyoviruses are likely maintained by vertical inheritance while those occasional AMGs may be due to horizontal evolution under different selection pressures such as environmental condition and host type.
In order to investigate the distribution of AMG content in different environments, the cyanophages were divided according to their location type (coastal, open ocean and lake) and plotted with their corresponding number of AMGs ( Figure 4A). Although no significant difference in AMG number was identified between coastal and the open ocean regions, it is clear that the cyanophages with low AMG content were all isolated from relatively eutrophic areas such as coastal regions and lakes ( Figure 4A). Further analysis using the six published cyanomyoviruses with AMG content less than 10 showed that these phages were all isolated from mid-latitude regions of 30-40 • N ( Figure 4B), and their hosts were all Synechococcus (Table 4). Compared with cyanomyoviruses isolated from Prochlorococcus, genomes of Synechococcus had a lower AMG content ( Figure 3, Table 1). Data presented here are consistent with previous studies that also proposed an association of AMG content with location and the host genus [11]. Moreover, neither S-N03 nor S-H34 showed strict host specificity and the ability to infect other Synechococcus strains besides their host Synechococcus MW02, which coincided with the speculation that the expansion of the host range may be also accompanied by a decrease in AMG content in some cases [11]. The low number of AMGs in the two phages may also be related to the genomic features of small genome size and high G + C content. By performing the correlation analyses on the number of AMGs, genome size and G + C values of the 81 representative cyanomyoviruses, a significantly negative correlation was obtained between the number of AMGs and GC% (r = −0.272, p < 0.05, Table 2) However, although the AMG content did not correlate with the genome size in our dataset, the newly acquired genes fixed in the viral genome is usually at the cost of larger genome size [10]. Collectively, the lower content of AMG in S-N03 and S-H34 might be a result of viral evolution that was likely shaped by the habitat (eutrophic seawater), host type and range (Synechococcus phages with relatively wide host range), and genomic features (small genome size and high G + C content). However, more evidence is still needed to elucidate the regulation of AMG type and content in cyanophages.

Conclusions
In this study, two novel Synechcoccus phages, S-N03 and S-H34, were isolated, and their complete genomes were sequenced and analyzed. Both phages have relatively small genomes with high G + C content. Fewer AMGs than other cyanomyoviruses and an absence of common photosynthesis-related genes were also observed, which imply their different evolutionary routes were shaped by habitat types and host preference, and give clues to their likely ecological functions. Due to the limited information on genes and proteins in the cyanobacterial gene database, isolation and sequencing of more cyanophages from different environments are urgently needed. The cyanophage genomic information can contribute to further research on the interaction between cyanophage and their hosts in aquatic environments, and provide insights into viral adaptive evolution and ecological functions.  Table S1: Infectivity of phage S-N03 and S-H34 against nine Synechococcus strains. Table S2: Statistics of Prochlorococcus and Synechococcus cyanomyoviruses that have uploaded the complete genome to the NCBI database by 2020. Table S3: Predicted ORFs in the S-N03 genome with homologues in the non-redundant database. Table S4: Predicted ORFs in the S-H34 genome with homologues in the non-redundant database.